A MULTI-OBJECTIVE COMBINATORIAL OPTIMISATION FRAMEWORK FOR LARGE SCALE HIERARCHICAL POPULATION SYNTHESIS

Imran Mahmood Nicholas Bishop Anisoara Calinescu Michael Wooldridge
Department of Computer Science
University of Oxford
Wolfson Building
Parks Road Oxford OX1 3QD UK
University of Cambridge
Trum**ton Street
Cambridge CB2 1PZ UK
ABSTRACT

In agent-based simulations, synthetic populations of agents are commonly used to represent the structure, behaviour, and interactions of individuals. However, generating a synthetic population that accurately reflects real population statistics is a challenging task, particularly when performed at scale. In this paper, we propose a multi objective combinatorial optimisation technique for large scale population synthesis. We demonstrate the effectiveness of our approach by generating a synthetic population for selected regions and validating it on contingency tables from real population data. Our approach supports complex hierarchical structures between individuals and households, is scalable to large populations and achieves minimal contigency table reconstruction error. Hence, it provides a useful tool for policymakers and researchers for simulating the dynamics of complex populations.

KEYWORDS

Agent-based simulations, hierarchical population synthesis, multi objective combinatorial optimisation, genetic algorithms.

INTRODUCTION

Population synthesis plays a crucial role in generating meaningful emergence structure from agent-based simulations. Common applications include urban planning, transportation and public health modelling Smith et al. (2017). A Synthetic Population (SP) is a simulated population that matches key demographic, social, economic, and geographic characteristics of a real world population. SPs assimilate real world data, which are often limited, sensitive, unavailable, or costly to obtain Barthelemy and Toint (2013); Hörl and Balac (2021) for modelling, and policy scenario testing. They are integral for initialising agent-based simulations (ABS), due to their realism, privacy preservation, flexibility111A flexible synthetic population algorithm can adjust its parameters and modelling assumptions to account for a variety of available data and different modelling goals and objectives, and reproducibility Ye et al. (2009),. ABS is a novel paradigm that helps the study of complex adaptive systems through a systematic bottom-up abstraction of the system, where the behaviour of individual agents and their interactions are studied to understand and predict the dynamics of these complex systems Macal and North (2005); Bonabeau (2002); Mahmood et al. (2022). They are used to explore and evaluate different assumptions, interventions, or policies Wu et al. (2022). This study aims to: (1) develop a methodology for generating synthetic populations at a selected scale and region that accurately matches the aggregate demographic characteristics and respects their hierarchical structure of the target population; (2)demonstrate the flexibility of the proposed approach in addressing diverse simulation requirements; (3) evaluate the synthesised population in terms of accuracy and computational efficiency. To achieve these objectives, we offer multi-objective combinatorial optimisation using the Non-dominated Sorting Genetic Algorithm II (NSGA-II) Deb et al. (2002). NSGA-II combines genetic algorithms with non-dominated sorting to efficiently search for optimal solutions in a combinatorial space. The optimisation objectives consist of individual demographic and spatial distributions, while allowing for weighting of these objectives depending on the simulation context. Key contributions of this paper are listed as follows:

  • The development of a novel methodology for generating country-scale synthetic populations that accurately represent the demographic structure, using multi-objective combinatorial optimisation techniques.

  • The assessment of the representativeness and accuracy of the generated population, the scalability of the approach, and the computational efficiency of the generation process.

  • Presentation of a case study demonstrating the generation of synthetic population of the selected regions in the city of Oxford, providing insights into the practical implementation of the proposed methodology.

  • Discussion of the advantages of our approach and of future work directions in the field of synthetic population generation for ABS.

Different approaches have been used for synthetic population generation. The approaches are grouped into three categories: Synthetic Reconstruction (SR), Combinatorial Optimisation (CO), and Statistical Learning (SL). SR methods: Jiang et al. (2022), Fabrice Yaméogo et al. (2021), Ponge et al. (2021), Pritchard and Miller (2012), Müller (2017) involve fitting and allocation to generate synthetic populations by adjusting weights and cell counts. CO methods: Chapuis et al. (2022), Harland et al. (2012); Wu et al. (2022), Kurban et al. (2011), Chen et al. (2016),Srinivasan et al. (2008) involves finding the best solution from a set of possibilities using optimisation techniques. SL methods: Sun et al. (2018), Farooq et al. (2013), Saadi et al. (2016) focus on the joint distribution of attributes and uses machine learning and probabilistic methods. Each approach has its strengths and considerations regarding accuracy, computational requirements, and data availability. SR methods simplify assumptions for accurate results with high-quality marginal data, while SL techniques capture complex attribute relationships but may be computationally demanding and require extensive training data. In contrast, CO offers a flexible approach, optimising multiple objectives, especially with a hierarchical structure, and can handle data sparsity based on problem nature and data quality. However, CO may need significant computational resources and tuning. The approach choice depends on goals, constraints, data availability, and resources. Our proposed CO-based approach efficiently generates a customizable representative large-scale population by utilising multi-objective optimisation to fit individual attributes with census data and a hierarchical structure.

Proposed Approach

In this section we discuss our proposed approach. First, we describe how synthetic population generation may be formulated as a multi-objective optimisation problem. Next, we discuss how the NGSAII genetic algorithm as a multi-objective evolutionary optimisation method to generate and optimise synthetic populations with respect to census contingency tables. As a proof of concept, we conduct a case study of the Oxfordshire region, using the UK census data to generate a hierarchical population of persons and households. At the end we discuss the results and we evaluate the proposed approach.

Problem Formulation

Given: A set of selected attributes in a real population where each attribute Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT has a set of categories (or groups, e.g., age =05,610,8185absent056108185=0-5,6-10,\dots 81-85= 0 - 5 , 6 - 10 , … 81 - 85), with respective frequencies FAisubscript𝐹subscript𝐴𝑖F_{A_{i}}italic_F start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT in the real population:

A={a1,a2,,an}𝐴subscript𝑎1subscript𝑎2subscript𝑎𝑛A=\{a_{1},a_{2},\dots,a_{n}\}italic_A = { italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } (1)
CAi={cAi,1,cAi,2,,cAi,mi}subscript𝐶subscript𝐴𝑖subscript𝑐subscript𝐴𝑖1subscript𝑐subscript𝐴𝑖2subscript𝑐subscript𝐴𝑖subscript𝑚𝑖C_{A_{i}}=\{c_{A_{i},1},c_{A_{i},2},\dots,c_{A_{i},m_{i}}\}italic_C start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_c start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 2 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT } (2)
FAi={fAi,1,fAi,2,,fAi,mi}subscript𝐹subscript𝐴𝑖subscript𝑓subscript𝐴𝑖1subscript𝑓subscript𝐴𝑖2subscript𝑓subscript𝐴𝑖subscript𝑚𝑖F_{A_{i}}=\{f_{A_{i},1},f_{A_{i},2},\dots,f_{A_{i},m_{i}}\}italic_F start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_f start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 2 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT } (3)

Objective: Generate a synthetic population that closely resembles the real population’s distribution of each attribute. Let X𝑋Xitalic_X be an individual 222Here we refer to an individual as a candidate solution, not a person in the synthetic population in the population, representing a synthetic population. For each attribute Ai,i=1,2,,Nformulae-sequencesubscript𝐴𝑖𝑖12𝑁A_{i},i=1,2,\dots,Nitalic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_N, the objective function is defined as: Minimise

Oi(X)=j=1mi|fAi,jfAi,j(X)|,O_{i}(X)=\sum_{j=1}^{m_{i}}\lvert f_{A_{i},j}-f^{\prime}_{A_{i},j}(X)\lvert,italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_f start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT - italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT ( italic_X ) | , (4)

where fAi,j(X)subscriptsuperscript𝑓subscript𝐴𝑖𝑗𝑋f^{\prime}_{A_{i},j}(X)italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT ( italic_X ) is the frequency of category cAi,jsubscript𝑐subscript𝐴𝑖𝑗c_{A_{i},j}italic_c start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT in the synthetic population X𝑋Xitalic_X.

Goal: To find a synthetic population Xsuperscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT that minimises all objective functions Oi(X)subscript𝑂𝑖𝑋O_{i}(X)italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X ), for i=1,2,,N𝑖12𝑁i=1,2,\dots,Nitalic_i = 1 , 2 , … , italic_N.

Generating a synthetic population involves creating a sample of individuals and households with specific characteristics that closely resemble the actual population. The goal is to capture the distributions of selected attributes found in the real population while preserving privacy. Contingency tables, which display the relationship between categorical variables, are used to describe statistical relationships between population characteristics. These tables help analyse patterns and trends among different demographic groups. By recreating the frequency distributions given by contingency tables, the representativeness of a synthetic population can be measured. Trade-offs may be necessary when fitting multiple contingency tables, as some may be more important than others depending on the application. Our approach allows practitioners to naturally balance objectives and obtain a synthetic population that suits their needs. In this study, we validate our methodology using cross tables from the 2011 UK census Statistics (2010), which include bivariate and trivariate tables that combine different attributes. In this paper, we have considered trivariate contingency tables (e.g., Sex:Age:Ethnicity, Sex:Age:Religion, Sex:Age:Qualification – see Figure 4).

A multi-objective optimisation algorithm can optimise two or more objectives simultaneously. This algorithm generates a set of Pareto-optimal solutions, providing a balance between objectives. The algorithm iteratively evolves a population of candidate solutions by applying genetic operators (selection, crossover, and mutation) while considering all the objectives having different weights according to their significance (e.g., in certain use-cases the economic attributes of persons and households may be more significant than ethnicity and religion). The Pareto-optimal solutions represent trade-offs between different distributions, allowing decision-makers to choose the most suitable synthetic population based on their requirements. Motivated by these features, as well as the complex hierarchical structure of synthetic populations, we employ the NSGA-II algorithm.

Multi-Objective Combinatorial Optimisation using Genetic Algorithms

We formulate population and household synthesis as a multi-objective combinatorial optimisation problem. We first present a brief primer on genetic algorithms (GAs) Wirsansky (2020), providing rationale for our use of this approach in population synthesis. GA is a type of evolutionary computation technique inspired by the process of natural selection. Genetic algorithms maintain a population of candidate solutions, which reproduce over multiple generations. In the context of our work, candidate solutions correspond to synthetic populations. A predefined selection process is used to determine which candidate solutions may reproduce at the end of each generation. The success of a candidate within the selection process is determined by their fitness, which is evaluated via a fitness function. The fitness of a synthetic population describes how well it recreates the frequency distributions of contingency tables. Reproduction consists of both crossover and mutation. We provide more details regarding each component of a genetic algorithm below:

  • Selection: This is the process of choosing individuals from the current population based on their fitness values. Selection favours individuals with higher fitness values (or lower in our case of minimisation the error), ensuring that the best solutions have a higher probability of being chosen for reproduction. Common selection methods include tournament selection, roulette wheel selection, and rank-based selection Deb (2011).

  • Crossover (or recombination): This operation combines the genetic material of two parent individuals to produce one or more offspring. The goal of crossover is to create new individuals that inherit the best traits from their parents, potentially leading to better solutions in the next generation. There are various types of crossover operators, such as one-point crossover, two-point crossover, and uniform crossover.

  • Mutation: This operation introduces small random changes in an individual’s genetic material. Mutation helps maintain diversity in the population and prevents premature convergence to sub-optimal solutions. Mutation operators can vary depending on the problem representation; for example, bit-flip** mutation for binary strings or Gaussian mutation for real-valued representations.

  • Fitness Evaluation:The fitness function evaluates the quality of each individual in the population based on how well they solve the given problem. It assigns a fitness value to each individual, which is then used for selecting and determining the best solutions. The fitness function is problem-specific and designed to guide the search towards optimal or near-optimal solutions.

Figure 1 shows the flow of the genetic algorithm.

Refer to caption
Figure 1: Genetic Algorithm Flow

The Non-dominated Sorting Genetic Algorithm II (NSGA-II) is a popular multi-objective optimisation algorithm that extends the conventional GA framework described above to handle problems with multiple conflicting objectives. NSGA-II employs a fast non-dominated sorting approach to categorise the individuals into different levels of Pareto frontier. Moreover, NSGA-II employs a crowding distance metric to maintain diversity in the population, preventing premature convergence to sub-optimal solutions. Using NSGA-II for synthetic population generation offers several advantages over traditional optimisation techniques: (i) it can optimise multiple aspects of the population simultaneously without requiring objectives to be combined into a single value; (ii) It employs a Pareto-based approach to identify non-dominated solutions that represent the best trade-offs between objectives, allowing stakeholders to choose the most suitable synthetic population; (iii) It preserves diversity in the population by using a crowding distance metric and incorporates elitism to preserve the best solutions found in previous generations; and (iv) It is scalable to handle problems with a large number of objectives or decision variables.

Refer to caption
Figure 2: Synthetic Population Generation using NGSAII
Refer to caption
Figure 3: Individual generation and Fitness Calculation

Our proposed algorithm is given in Figure 2. At first we define the data structure and encoding of the individuals (solutions) and the population, which will store a group of individuals. The problem is to generate random samples of persons and households and then allocate persons into households using census data constraints. The objective is to minimise the difference of error between the generated samples and the actual census data, for each selected attribute,therefore the problem is multi-objective (lines 1–3). Then we create an initial population by generating random individuals 333An individual is a term typically used in GA for the entity being generated. In our case it could be a person or a household using the procedure shown in figure 3 (line 4). In this procedure first we calculate attribute weights from the census data tables. Then we generate random samples for each attribute and finally combine these attribute samples to form a set of individuals. We use a rule-based validation routine to accept or reject a random combination if it does not satisfy certain rule (e.g., an individual of age <18absent18<18< 18 cannot be married). Next we calculate the fitness value of each individual in the initial population using the fitness evaluation function using the procedure shown in Figure 3. This is a proposed method of calculating the total area of the difference between the two curves of the generated sample and the actual data, using the Trapezoidal numerical integration Yeh et al. (2002) (line 5). This fitness measure is more effective than conventional approaches as it captures the overall difference between the distributions and takes into account the shape and distribution of the curves. Hence, provides a more accurate measure of how well the generated population matches the target population across the entire domain. Next we create a data structure to store the best non-dominated solutions found throughout the generations, called the Pareto frontier 444In multi-objective optimisation, the Pareto frontier is a set of optimal solutions that represent the trade-offs between the conflicting objectives. It is a set of solutions where no objective can be improved without worsening at least one of the other objectives.. Now the algorithm enters into the main loop of the evolutionary process. For a given maximum number of generations, We iterate through each generation, and choose a set of parent individuals from the current population using binary tournament selection. The selection process is based on the individuals’ rank and crowding distance. Crowding distance measures the distance between a solution and its neighbouring solutions in the objective space. Then create a new set of offspring individuals by applying genetic operators (crossover and mutation) to the selected parents. We use a two-point crossover method by selecting two random points along the length of the parent chromosomes and swap** the segments between the two points to create new offspring. For mutation we implemented a swap** technique which randomly selects an attribute of an individual and swaps its value with another individual. When the genetic operators are applied, we compute the fitness values for each offspring individual using the fitness evaluation function. Then we merge the current and the offspring population to create a combined population and rank the individuals in the combined population into non-dominated fronts using fast non-dominated sorting. Then we compute the crowding distance for each individual in each front, which is a measure of how crowded the solutions are in the objective space. Then we choose the best individuals from the combined population, considering both rank and crowding distance, to create the new population for the next generation. Then we add the best non-dominated solutions from the current population to the Pareto frontier (lines 7–17). Finally we return the final population selected from the the Pareto frontier of non-dominated solutions (line 18). Selecting the best solution from a Pareto frontier depends on the preferences. We use a weighted sum approach, by assigning weights to the objectives and select the solution with the highest weighted sum. Once the algorithm is terminated we retrieve the generated population of individuals and store it in a CSV file.

UK Case Study

This section presents the implementation details of our proposed approach In this section, we present the case study to generate a representative synthetic population of a selected region in the UK using our propose approach. Our case study is conducted at a geographical scale of Middle Super Output Areas (MSOA). There are approximately 7,200 MSOAs in England and each MSOA contains between 5,000 and 15,000 residents. They are used as geographic building blocks for analysing data and gaining insights into the distribution of characteristics across larger areas and assist in policy-making and interventions. We leverage UK Census data for the attributes of persons and households Statistics (2010). The ethnicity in the Persons data are symbolised as: W1-W4 are categories of White; M1-M4 are mixed categories; A1-A5 are Asian categories; B1-B3 are Black and O1-O2 are Other categories. Similarly Religions are symbolised as: C=Christian, B=Buddhist, H=Hindu, J=Jewish, M=Muslim, S=Sikh, O=Other religions, N=No religion and NS=Religion not stated. Different compositions represented in the household data are categories in Table 4.

Refer to caption
Figure 4: (a) Input Tables for fitness evaluation (b)Household Composition Types
Refer to caption
Figure 5: (a) Generated Persons (b) Generated Households
Refer to caption
Figure 6: Generation of Persons [Blue = Actual, Red = predicted]
Refer to caption
Figure 7: Generation of Households and Household compositions [Blue = Actual, Red = predicted]
Refer to caption
Figure 8: (a) Pareto frontier [red = selected best solution] (b) Convergence of different objectives

In this case-study we have selected two types of entities in our synthetic population: (i) Persons and (ii) Households. We aim to generate samples of persons and households according to the statistics of the selected MSOA and fit both sets using the contingency tables shown in Figure 4. Our proposed approach leverages Distributed Evolutionary Algorithms. We have extended the DEAP Python framework Fortin et al. (2012) to support the generation of synthetic population using: (a) a variety of input data; (b) selection of individual’s attributes, (c) defining multiple objectives; (d) logical design of how random individuals are generated with rule-based validation; (d) design of complex fitness evaluation criteria; (e) addition or modification of genetic operators; and finally the performance improvements using parallel processing. Our implementation is available on Github555https://github.com/imqhashmi/SynPoP-GA.

Results and Analysis

This section illustrates the outputs of the execution runs and the results of our case study. We performed our analysis on a selected MSOA at a time. It is however possible to execute multiple MSOAs in parallel in order to speed up the process of generating the entire population at the country scale. We devises the framework to operate in two stages: (i) Generating persons and (ii) Generating households, because to generate households we require persons population as input. Table 5 illustrates several generated samples of Persons and Households. Figure 6 shows the generation results in terms of actual and the predicted population. We group the age attributes into three categories: (a) Children (ch); (b) Adults (ad) and (c) Elders (el). Similarly we grouped ethnicity into main groups: (a) White (wht); (b) Mixed (mxd); (c) Asian (asn); (d) Black (blk) and Others (oth). The difference of sum of each generated group of attributes (red) with the actual data (blue) can be noted in the figure. We also calculated the root mean square error (RMSE) as an error measure to see the difference. In the next stage, we generated households as shown in Figure 7. At this stage, we implemented the household composition by allocating individuals from the persons population into suitable households based on their attributes such as size, type and composition structure (see Figure 4). For example, in order to allocate persons in a household of size 7 and composition type: ’2A 3C’ we search and allocate two adults and 5 children from the pool of persons. Currently this allocation is not sensitive towards ethnicity, religion, or other pertinent features, and is considered as our future work. A typical run-time of a single generation for an area of 7000 persons, and population size of 100 ranges between 5-7 seconds. It takes 30-35 minutes to run 500 generations. With parallel random sampling of individuals, parallel fitness evaluation and parallel genetic operations the execution time can be substantially reduced, which is considered as our future work. After a run of 500 generations a convergence plot is generated, as shown in Figure 8, where each line represents an optimisation objective (i.e., five objectives for each attribute of persons), X-axis shows the number of generations, and Y-axis shows the descent of normalised fitness. The rate of convergence completely depends on the genetic makeup of the feature and the operators used. When the execution is complete we generate a Pareto frontier pair plot as shown in Figure 8. In the pair plot, each pair of objectives is placed against each other in a scatter plot, and the diagonal plots show the distribution of each objective. The plot shows the selected best solution (highlighted in red), based on our weighted sum of difference method.

Summary and Conclusion

In this paper We present a novel approach for synthetic population creation in agent-based simulations, addressing the challenges of accuracy and representation. By employing the NGAII algorithm, a multi-objective combinatorial optimisation technique, we demonstrate the effectiveness of our approach through a case study. The results exhibit its suitability for complex and large-scale problems, offering enhanced accuracy and representation compared to traditional methods.

This case study serves as a proof of concept, validating our population synthesis approach for agent-based simulations. The focus lies in optimising multiple objectives, such as demographic characteristics, to accurately represent the target population. The findings reveal that our proposed method generates high-quality synthetic populations mirroring the target population’s characteristics. Furthermore, our approach is efficient, scalable, and easily adaptable to different geographic regions, input data, and types of individuals (e.g., persons, households, cars, organisations). Notably, it excels in creating and fitting hierarchical structures using input data, enabling allocation of persons in households, assignment of cars to individuals, and allocation of workplaces to persons.

We assert that multi-objective combinatorial optimisation is a comprehensive approach for synthetic population generation, capable of simultaneously optimising multiple objectives across diverse problem domains. This work contributes to the field of agent-based modelling and simulation, opening avenues for develo** more realistic and large-scale models across various domains. Future work include expanding our household composition scheme to incorporate ethnicity and religion, as well as enhancing computational efficiency through parallel processing in random individual generation, fitness evaluation, and genetic operations.

Acknowledgement

This research was supported by a UKRI AI World Leading Researcher Fellowship awarded to Wooldridge (grant EP/W002949/1). M. Wooldridge and A. Calinescu acknowledge funding from Trustworthy AI - Integrating Learning, Optimisation and Reasoning (TAILOR) (https://tailor-network.eu/), a project funded by European Union Horizon2020 research and innovation program under Grant Agreement 952215.

References

  • Barthelemy and Toint (2013) Barthelemy J. and Toint P.L., 2013. Synthetic population generation without a sample. Transportation Science, 47, no. 2, 266–279.
  • Bonabeau (2002) Bonabeau E., 2002. Agent-based modeling: Methods and techniques for simulating human systems. Proceedings of the national academy of sciences, 99, no. suppl_3, 7280–7287.
  • Chapuis et al. (2022) Chapuis K.; Taillandier P.; and Drogoul A., 2022. Generation of synthetic populations in social simulations: A review of methods and practices. Journal of Artificial Societies and Social Simulation, 25, no. 2.
  • Chen et al. (2016) Chen Y.; Elliot M.; and Sakshaug J., 2016. A genetic algorithm approach to synthetic data production. In Proceedings of the 1st International Workshop on AI for Privacy and Security. 1–4.
  • Deb (2011) Deb K., 2011. Multi-objective optimisation using evolutionary algorithms: an introduction. Springer.
  • Deb et al. (2002) Deb K.; Pratap A.; Agarwal S.; and Meyarivan T., 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation, 6, no. 2, 182–197.
  • Fabrice Yaméogo et al. (2021) Fabrice Yaméogo B.; Gastineau P.; Hankach P.; and Vandanjon P.O., 2021. Comparing methods for generating a two-layered synthetic population. Transportation research record, 2675, no. 1, 136–147.
  • Farooq et al. (2013) Farooq B.; Bierlaire M.; Hurtubia R.; and Flötteröd G., 2013. Simulation based population synthesis. Transportation Research Part B: Methodological, 58, 243–263.
  • Fortin et al. (2012) Fortin F.A.; De Rainville F.M.; Gardner M.A.; Parizeau M.; and Gagné C., 2012. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research, 13, 2171–2175.
  • Harland et al. (2012) Harland K.; Heppenstall A.; Smith D.; and Birkin M.H., 2012. Creating realistic synthetic populations at varying spatial scales: A comparative critique of population synthesis techniques. Journal of Artificial Societies and Social Simulation, 15, no. 1.
  • Hörl and Balac (2021) Hörl S. and Balac M., 2021. Synthetic population and travel demand for Paris and Île-de-France based on open and publicly available data. Transportation Research Part C: Emerging Technologies, 130, 103291.
  • Jiang et al. (2022) Jiang N.; Crooks A.T.; Kavak H.; Burger A.; and Kennedy W.G., 2022. A method to create a synthetic population with social networks for geographically-explicit agent-based models. Computational Urban Science, 2, no. 1, 7.
  • Kurban et al. (2011) Kurban H.; Gallagher R.; Kurban G.A.; and Persky J., 2011. A beginner’s guide to creating small-area cross-tabulations. Cityscape, 225–235.
  • Macal and North (2005) Macal C.M. and North M.J., 2005. Tutorial on agent-based modeling and simulation. In Proceedings of the Winter Simulation Conference, 2005. IEEE, 14–pp.
  • Mahmood et al. (2022) Mahmood I.; Arabnejad H.; Suleimenova D.; Sassoon I.; Marshan A.; Serrano-Rico A.; Louvieris P.; Anagnostou A.; JE Taylor S.; Bell D.; et al., 2022. FACS: a geospatial agent-based simulator for analysing COVID-19 spread and public health measures on local regions. Journal of Simulation, 16, no. 4, 355–373.
  • Müller (2017) Müller K., 2017. A generalized approach to population synthesis. Ph.D. thesis, ETH Zurich.
  • Ponge et al. (2021) Ponge J.; Enbergs M.; Schüngel M.; Hellingrath B.; Karch A.; and Ludwig S., 2021. Generating synthetic populations based on german census data. In 2021 Winter Simulation Conference (WSC). IEEE, 1–12.
  • Pritchard and Miller (2012) Pritchard D.R. and Miller E.J., 2012. Advances in population synthesis: fitting many attributes per agent and fitting to household and person margins simultaneously. Transportation, 39, no. 3, 685–704.
  • Saadi et al. (2016) Saadi I.; Mustafa A.; Teller J.; Farooq B.; and Cools M., 2016. Hidden Markov Model-based population synthesis. Transportation Research Part B: Methodological, 90, 1–21.
  • Smith et al. (2017) Smith A.; Lovelace R.; and Birkin M., 2017. Population synthesis with quasirandom integer sampling. Journal of Artificial Societies and Social Simulation, 20, no. 4.
  • Srinivasan et al. (2008) Srinivasan S.; Ma L.; and Yathindra K., 2008. Procedure for forecasting household characteristics for input to travel-demand models. Tech. rep.
  • Statistics (2010) Statistics N., 2010. Nomis - Nomis - Official Census and Labour Market Statistics. https://www.nomisweb.co.uk/. [Accessed 15-Apr-2023].
  • Sun et al. (2018) Sun L.; Erath A.; and Cai M., 2018. A hierarchical mixture modeling framework for population synthesis. Transportation Research Part B: Methodological, 114, 199–212.
  • Wirsansky (2020) Wirsansky E., 2020. Hands-on genetic algorithms with Python: applying genetic algorithms to solve real-world deep learning and artificial intelligence problems. Packt Publishing Ltd.
  • Wu et al. (2022) Wu G.; Heppenstall A.; Meier P.; Purshouse R.; and Lomax N., 2022. A synthetic population dataset for estimating small area health and socio-economic outcomes in Great Britain. Scientific Data, 9, no. 1, 19.
  • Ye et al. (2009) Ye X.; Konduri K.; Pendyala R.M.; Sana B.; and Waddell P., 2009. A methodology to match distributions of both household and person attributes in the generation of synthetic populations. In 88th Annual Meeting of the transportation research Board, Washington, DC.
  • Yeh et al. (2002) Yeh S.T. et al., 2002. Using trapezoidal rule for the area under a curve calculation. Proceedings of the 27th Annual SAS® User Group International (SUGI’02), 1–5.