-
Self-Labeling the Job Shop Scheduling Problem
Authors:
Andrea Corsini,
Angelo Porrello,
Simone Calderara,
Mauro Dell'Amico
Abstract:
In this work, we propose a Self-Supervised training strategy specifically designed for combinatorial problems. One of the main obstacles in applying supervised paradigms to such problems is the requirement of expensive target solutions as ground-truth, often produced with costly exact solvers. Inspired by Semi- and Self-Supervised learning, we show that it is possible to easily train generative mo…
▽ More
In this work, we propose a Self-Supervised training strategy specifically designed for combinatorial problems. One of the main obstacles in applying supervised paradigms to such problems is the requirement of expensive target solutions as ground-truth, often produced with costly exact solvers. Inspired by Semi- and Self-Supervised learning, we show that it is possible to easily train generative models by sampling multiple solutions and using the best one according to the problem objective as a pseudo-label. In this way, we iteratively improve the model generation capability by relying only on its self-supervision, completely removing the need for optimality information. We prove the effectiveness of this Self-Labeling strategy on the Job Shop Scheduling (JSP), a complex combinatorial problem that is receiving much attention from the Reinforcement Learning community. We propose a generative model based on the well-known Pointer Network and train it with our strategy. Experiments on popular benchmarks demonstrate the potential of this approach as the resulting models outperform constructive heuristics and current state-of-the-art learning proposals for the JSP.
△ Less
Submitted 8 July, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Parallel drone scheduling vehicle routing problems with collective drones
Authors:
Roberto Montemanni,
Mauro Dell'Amico,
Andrea Corsini
Abstract:
We study last-mile delivery problems where trucks and drones collaborate to deliver goods to final customers. In particular, we focus on problem settings where either a single truck or a fleet with several homogeneous trucks work in parallel to drones, and drones have the capability of collaborating for delivering missions. This cooperative behaviour of the drones, which are able to connect to eac…
▽ More
We study last-mile delivery problems where trucks and drones collaborate to deliver goods to final customers. In particular, we focus on problem settings where either a single truck or a fleet with several homogeneous trucks work in parallel to drones, and drones have the capability of collaborating for delivering missions. This cooperative behaviour of the drones, which are able to connect to each other and work together for some delivery tasks, enhance their potential, since connected drone has increased lifting capabilities and can fly at higher speed, overcoming the main limitations of the setting where the drones can only work independently.
In this work, we contribute a Constraint Programming model and a valid inequality for the version of the problem with one truck, namely the \emph{Parallel Drone Scheduling Traveling Salesman Problem with Collective Drones} and we introduce for the first time the variant with multiple trucks, called the \emph{Parallel Drone Scheduling Vehicle Routing Problem with Collective Drones}. For the latter variant, we propose two Constraint Programming models and a Mixed Integer Linear Programming model.
An extensive experimental campaign leads to state-of-the-art results for the problem with one truck and some understanding of the presented models' behaviour on the version with multiple trucks. Some insights about future research are finally discussed.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Constraint Programming models for the parallel drone scheduling vehicle routing problem
Authors:
Roberto Montemanni,
Mauro Dell'Amico
Abstract:
Drones are currently seen as a viable way for improving the distribution of parcels in urban and rural environments, while working in coordination with traditional vehicles like trucks. In this paper we consider the parallel drone scheduling vehicle routing problem, where the service of a set of customers requiring a delivery is split between a fleet of trucks and a fleet of drones. We consider tw…
▽ More
Drones are currently seen as a viable way for improving the distribution of parcels in urban and rural environments, while working in coordination with traditional vehicles like trucks. In this paper we consider the parallel drone scheduling vehicle routing problem, where the service of a set of customers requiring a delivery is split between a fleet of trucks and a fleet of drones. We consider two variations of the problem. In the first one the problem is more theoretical, and the target is the minimization of the time required to complete the service and have all the vehicles back to the depot. In the second variant more realistic constraints involving operating costs, capacity limitation and workload balance, are considered, and the target is to minimize the total operational costs. We propose several constraint programming models to deal with the two problems. An experimental champaign on the instances previously adopted in the literature is presented to validate the new solving methods. The results show that on top of being a viable way to solve problems to optimality, the models can also be used to derive effective heuristic solutions and high-quality lower bounds for the optimal cost, if the execution is interrupted after its natural end.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Precedence-Constrained Arborescences
Authors:
Xiaochen Chou,
Mauro Dell'Amico,
Jafar Jamal,
Roberto Montemanni
Abstract:
The minimum-cost arborescence problem is a well-studied problem in the area of graph theory, with known polynomial-time algorithms for solving it. Previous literature introduced new variations on the original problem with different objective function and/or constraints. Recently, the Precedence-Constrained Minimum-Cost Arborescence problem was proposed, in which precedence constraints are enforced…
▽ More
The minimum-cost arborescence problem is a well-studied problem in the area of graph theory, with known polynomial-time algorithms for solving it. Previous literature introduced new variations on the original problem with different objective function and/or constraints. Recently, the Precedence-Constrained Minimum-Cost Arborescence problem was proposed, in which precedence constraints are enforced on pairs of vertices. These constraints prevent the formation of directed paths that violate precedence relationships along the tree. We show that this problem is NP-hard, and we introduce a new scalable mixed integer linear programming model for it. With respect to the previous models, the newly proposed model performs substantially better. This work also introduces a new variation on the minimum-cost arborescence problem with precedence constraints. We show that this new variation is also NP-hard, and we propose several mixed integer linear programming models for formulating the problem.
△ Less
Submitted 14 October, 2022; v1 submitted 3 August, 2022;
originally announced August 2022.
-
Learning the Quality of Machine Permutations in Job Shop Scheduling
Authors:
Andrea Corsini,
Simone Calderara,
Mauro Dell'Amico
Abstract:
In recent years, the power demonstrated by Machine Learning (ML) has increasingly attracted the interest of the optimization community that is starting to leverage ML for enhancing and automating the design of algorithms. One combinatorial optimization problem recently tackled with ML is the Job Shop scheduling Problem (JSP). Most of the works on the JSP using ML focus on Deep Reinforcement Learni…
▽ More
In recent years, the power demonstrated by Machine Learning (ML) has increasingly attracted the interest of the optimization community that is starting to leverage ML for enhancing and automating the design of algorithms. One combinatorial optimization problem recently tackled with ML is the Job Shop scheduling Problem (JSP). Most of the works on the JSP using ML focus on Deep Reinforcement Learning (DRL), and only a few of them leverage supervised learning techniques. The recurrent reasons for avoiding supervised learning seem to be the difficulty in casting the right learning task, i.e., what is meaningful to predict, and how to obtain labels. Therefore, we first propose a novel supervised learning task that aims at predicting the quality of machine permutations. Then, we design an original methodology to estimate this quality, and we use these estimations to create an accurate sequential deep learning model (binary accuracy above 95%). Finally, we empirically demonstrate the value of predicting the quality of machine permutations by enhancing the performance of a simple Tabu Search algorithm inspired by the works in the literature.
△ Less
Submitted 16 September, 2022; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Unsupervised Detection and Clustering of Malicious TLS Flows
Authors:
Gibran Gomez,
Platon Kotzias,
Matteo Dell'Amico,
Leyla Bilge,
Juan Caballero
Abstract:
Malware abuses TLS to encrypt its malicious traffic, preventing examination by content signatures and deep packet inspection. Network detection of malicious TLS flows is an important, but challenging, problem. Prior works have proposed supervised machine learning detectors using TLS features. However, by trying to represent all malicious traffic, supervised binary detectors produce models that are…
▽ More
Malware abuses TLS to encrypt its malicious traffic, preventing examination by content signatures and deep packet inspection. Network detection of malicious TLS flows is an important, but challenging, problem. Prior works have proposed supervised machine learning detectors using TLS features. However, by trying to represent all malicious traffic, supervised binary detectors produce models that are too loose, thus introducing errors. Furthermore, they do not distinguish flows generated by different malware. On the other hand, supervised multi-class detectors produce tighter models and can classify flows by malware family, but require family labels, which are not available for many samples.
To address these limitations, this work proposes a novel unsupervised approach to detect and cluster malicious TLS flows. Our approach takes as input network traces from sandboxes. It clusters similar TLS flows using 90 features that capture properties of the TLS client, TLS server, certificate, and encrypted payload; and uses the clusters to build an unsupervised detector that can assign a malicious flow to the cluster it belongs to, or determine it is benign. We evaluate our approach using 972K traces from a commercial sandbox and 35M TLS flows from a research network. Our clustering shows very high precision and recall with an F1 score of 0.993. We compare our unsupervised detector with two state-of-the-art approaches, showing that it outperforms both. The false detection rate of our detector is 0.032% measured over four months of traffic.
△ Less
Submitted 23 December, 2022; v1 submitted 8 September, 2021;
originally announced September 2021.
-
Benchmark Instances and Optimal Solutions for the Traveling Salesman Problem with Drone
Authors:
Mauro Dell'Amico,
Roberto Montemanni,
Stefano Novellani
Abstract:
The use of drones in logistics is gaining more and more interest, and drones are becoming a more viable and common way of distributing parcels in an urban environment. As a consequence, there is a flourishing production of articles in the field of operational optimization of the combined use of trucks and drones for fulfilling customers requests. The aim is minimizing the total time required to se…
▽ More
The use of drones in logistics is gaining more and more interest, and drones are becoming a more viable and common way of distributing parcels in an urban environment. As a consequence, there is a flourishing production of articles in the field of operational optimization of the combined use of trucks and drones for fulfilling customers requests. The aim is minimizing the total time required to service all the customers, since this has obvious economical impacts. However in the literature there is not yet a widely recognized basic model, and there are not well assessed sets of instances and optimal solutions that can be considered as a benchmark to prove the effectiveness of new solution methods. The aim of this paper is to fill this gap. On one side we will clearly describe some of the most common components of the truck/drone routing problems and we will define nine basic problem settings, by combining these components. On the other side we will consider some of the instances used by many researchers and we will provide optimal solutions for all the problem settings previously identified. Instances and detailed solutions are then organized into benchmarks made publicly available as validation tools for future research methods.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Scheduling jobs with release dates on identical parallel machines by minimizing the total weighted completion time
Authors:
Arthur Kramer,
Mauro Dell'Amico,
Dominique Feillet,
Manuel Iori
Abstract:
This paper addresses the problem of scheduling a set of jobs that are released over the time on a set of identical parallel machines, aiming at the minimization of the total weighted completion time. This problem, referred to as $P|r_j|\sum w_jC_j$, is of great importance in practice, because it models a variety of real-life applications. Despite its importance, the $P|r_j|\sum w_jC_j$ has not rec…
▽ More
This paper addresses the problem of scheduling a set of jobs that are released over the time on a set of identical parallel machines, aiming at the minimization of the total weighted completion time. This problem, referred to as $P|r_j|\sum w_jC_j$, is of great importance in practice, because it models a variety of real-life applications. Despite its importance, the $P|r_j|\sum w_jC_j$ has not received much attention in the recent literature. In this work, we fill this gap by proposing mixed integer linear programs and a tailored branch-and-price algorithm. Our {branch-and-price} relies on the decomposition of an arc-flow formulation and on the use of efficient exact and heuristic methods for solving the pricing subproblem. Computational experiments carried out on a set of randomly generated instances prove that the proposed methods can solve to the proven optimality instances with up to 200 jobs and 10 machines, and provide very low gaps for larger instances.
△ Less
Submitted 27 June, 2020; v1 submitted 20 May, 2020;
originally announced May 2020.
-
FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance
Authors:
Matteo Dell'Amico
Abstract:
FISHDBC is a flexible, incremental, scalable, and hierarchical density-based clustering algorithm. It is flexible because it empowers users to work on arbitrary data, skip** the feature extraction step that usually transforms raw data in numeric arrays letting users define an arbitrary distance function instead. It is incremental and scalable: it avoids the $\mathcal O(n^2)$ performance of other…
▽ More
FISHDBC is a flexible, incremental, scalable, and hierarchical density-based clustering algorithm. It is flexible because it empowers users to work on arbitrary data, skip** the feature extraction step that usually transforms raw data in numeric arrays letting users define an arbitrary distance function instead. It is incremental and scalable: it avoids the $\mathcal O(n^2)$ performance of other approaches in non-metric spaces and requires only lightweight computation to update the clustering when few items are added. It is hierarchical: it produces a "flat" clustering which can be expanded to a tree structure, so that users can group and/or divide clusters in sub- or super-clusters when data exploration requires so. It is density-based and approximates HDBSCAN*, an evolution of DBSCAN.
△ Less
Submitted 16 October, 2019;
originally announced October 2019.
-
Models and algorithms for the Flying Sidekick Traveling Salesman Problem
Authors:
Mauro Dell'Amico,
Roberto Montemanni,
Stefano Novellani
Abstract:
This paper presents a set of new formulations for the Flying Sidekick Traveling Salesman Problem, where a truck and a drone cooperate to delivery parcels to customers minimizing the completion time. The new formulations improve the results of the literature by solving to optimality several benchmark instances for which an optimal solution was previously unknown. A matheuristic algorithm, strongly…
▽ More
This paper presents a set of new formulations for the Flying Sidekick Traveling Salesman Problem, where a truck and a drone cooperate to delivery parcels to customers minimizing the completion time. The new formulations improve the results of the literature by solving to optimality several benchmark instances for which an optimal solution was previously unknown. A matheuristic algorithm, strongly based on the new models, is also discussed. Experimental results show that this method is able to provide good quality solutions in short time even for the larger instances, on which the mathematical models struggle to provide either good heuristic solution or strong lower bounds.
△ Less
Submitted 11 October, 2019; v1 submitted 4 October, 2019;
originally announced October 2019.
-
Scheduling With Inexact Job Sizes: The Merits of Shortest Processing Time First
Authors:
Matteo Dell'Amico
Abstract:
It is well known that size-based scheduling policies, which take into account job size (i.e., the time it takes to run them), can perform very desirably in terms of both response time and fairness. Unfortunately, the requirement of knowing a priori the exact job size is a major obstacle which is frequently insurmountable in practice. Often, it is possible to get a coarse estimation of job size, bu…
▽ More
It is well known that size-based scheduling policies, which take into account job size (i.e., the time it takes to run them), can perform very desirably in terms of both response time and fairness. Unfortunately, the requirement of knowing a priori the exact job size is a major obstacle which is frequently insurmountable in practice. Often, it is possible to get a coarse estimation of job size, but unfortunately analytical results with inexact job sizes are challenging to obtain, and simulation-based studies show that several size-based algorithm are severely impacted by job estimation errors. For example, Shortest Remaining Processing Time (SRPT), which yields optimal mean sojourn time when job sizes are known exactly, can drastically underperform when it is fed inexact job sizes.
Some algorithms have been proposed to better handle size estimation errors, but they are somewhat complex and this makes their analysis challenging. We consider Shortest Processing Time (SPT), a simplification of SRPT that skips the update of "remaining" job size and results in a preemptive algorithm that simply schedules the job with the shortest estimated processing time. When job size is inexact, SPT performs comparably to the best known algorithms in the presence of errors, while being definitely simpler. In this work, SPT is evaluated through simulation, showing near-optimal performance in many cases, with the hope that its simplicity can open the way to analytical evaluation even when inexact inputs are considered.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
Matheuristic algorithms for the parallel drone scheduling traveling salesman problem
Authors:
Mauro Dell'Amico,
Roberto Montemanni,
Stefano Novellani
Abstract:
In a near future drones are likely to become a viable way of distributing parcels in a urban environment. In this paper we consider the parallel drone scheduling traveling salesman problem, where a set of customers requiring a delivery is split between a truck and a fleet of drones, with the aim of minimizing the total time required to service all the customers.
We present a set of matheuristic…
▽ More
In a near future drones are likely to become a viable way of distributing parcels in a urban environment. In this paper we consider the parallel drone scheduling traveling salesman problem, where a set of customers requiring a delivery is split between a truck and a fleet of drones, with the aim of minimizing the total time required to service all the customers.
We present a set of matheuristic methods for the problem. The new approaches are validated via an experimental campaign on two sets of benchmarks available in the literature. It is shown that the approaches we propose perform very well on small/medium size instances. Solving a mixed integer linear programming model to optimality leads to the first optimality proof for all the instances with 20 customers considered, while the heuristics are shown to be fast and effective on the same dataset. When considering larger instances with 48 to 229 customers, the results are competitive with state-of-the-art methods and lead to 28 new best known solutions out of the 90 instances considered.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Drone-assisted deliveries: new formulations for the Flying Sidekick Traveling Salesman Problem
Authors:
Mauro Dell'Amico,
Roberto Montemanni,
Stefano Novellani
Abstract:
In this paper we consider a problem related to deliveries assisted by an unmanned aerial vehicle, so-called drone. In particular we consider the Flying Sidekick Traveling Salesman Problem, where a truck and a drone cooperate to delivery parcels to customers minimizing the completion time. In the following we improve the formulation found in the related literature. We propose three-indexed and two-…
▽ More
In this paper we consider a problem related to deliveries assisted by an unmanned aerial vehicle, so-called drone. In particular we consider the Flying Sidekick Traveling Salesman Problem, where a truck and a drone cooperate to delivery parcels to customers minimizing the completion time. In the following we improve the formulation found in the related literature. We propose three-indexed and two-indexed formulations and a set of inequalities that can be implemented in a branch-and-cut fashion. We could find the optimal solutions for most of the literature instances. Moreover, we consider two versions of the problem: one in which the drone is allowed to wait at the customers, as in the literature, and one where waiting is allowed only in flying mode. The solving methodologies are adapted to both versions. A comparison between the two versions is provided.
△ Less
Submitted 31 May, 2019;
originally announced May 2019.
-
The Supermarket Model with Known and Predicted Service Times
Authors:
Michael Mitzenmacher,
Matteo Dell'Amico
Abstract:
The supermarket model refers to a system with a large number of queues, where new customers choose d queues at random and join the one with the fewest customers. This model demonstrates the power of even small amounts of choice, as compared to simply joining a queue chosen uniformly at random, for load balancing systems. In this work we perform simulation-based studies to consider variations where…
▽ More
The supermarket model refers to a system with a large number of queues, where new customers choose d queues at random and join the one with the fewest customers. This model demonstrates the power of even small amounts of choice, as compared to simply joining a queue chosen uniformly at random, for load balancing systems. In this work we perform simulation-based studies to consider variations where service times for a customer are predicted, as might be done in modern settings using machine learning techniques or related mechanisms. Our primary takeaway is that using even seemingly weak predictions of service times can yield significant benefits over blind First In First Out queueing in this context. However, some care must be taken when using predicted service time information to both choose a queue and order elements for service within a queue; while in many cases using the information for both choosing and ordering is beneficial, in many of our simulation settings we find that simply using the number of jobs to choose a queue is better when using predicted service times to order jobs in a queue. In our simulations, we evaluate both synthetic and real-world workloads--in the latter, service times are predicted by machine learning. Our results provide practical guidance for the design of real-world systems; moreover, we leave many natural theoretical open questions for future work, validating their relevance to real-world situations.
△ Less
Submitted 17 February, 2022; v1 submitted 23 May, 2019;
originally announced May 2019.
-
A Branch-and-Price Algorithm for the Temporal Bin Packing Problem
Authors:
Mauro Dell'Amico,
Fabio Furini,
Manuel Iori
Abstract:
We study an extension of the classical Bin Packing Problem, where each item consumes the bin capacity during a given time window that depends on the item itself. The problem asks for finding the minimum number of bins to pack all the items while respecting the bin capacity at any time instant. A polynomial-size formulation, an exponential-size formulation, and a number of lower and upper bounds ar…
▽ More
We study an extension of the classical Bin Packing Problem, where each item consumes the bin capacity during a given time window that depends on the item itself. The problem asks for finding the minimum number of bins to pack all the items while respecting the bin capacity at any time instant. A polynomial-size formulation, an exponential-size formulation, and a number of lower and upper bounds are studied. A branch-and-price algorithm for solving the exponential-size formulation is introduced. An overall algorithm combining the different methods is then proposed and tested trough extensive computational experiments.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Enhanced arc-flow formulations to minimize weighted completion time on identical parallel machines
Authors:
Arthur Kramer,
Mauro Dell'Amico,
Manuel Iori
Abstract:
We consider the problem of scheduling a set of jobs on a set of identical parallel machines, with the aim of minimizing the total weighted completion time. The problem has been solved in the literature with a number of mathematical formulations, some of which require the implementation of tailored branch-and-price methods. In our work, we solve the problem instead by means of new arc-flow formulat…
▽ More
We consider the problem of scheduling a set of jobs on a set of identical parallel machines, with the aim of minimizing the total weighted completion time. The problem has been solved in the literature with a number of mathematical formulations, some of which require the implementation of tailored branch-and-price methods. In our work, we solve the problem instead by means of new arc-flow formulations, by first representing it on a capacitated network and then invoking a mixed integer linear model with a pseudo-polynomial number of variables and constraints. According to our computational tests, existing formulations from the literature can solve to proven optimality benchmark instances with up to 100 jobs, whereas our most performing arc-flow formulation solves all instances with up to 400 jobs and provides very low gap for larger instances with up to 1000 jobs.
△ Less
Submitted 31 August, 2018;
originally announced August 2018.
-
On Fair Size-Based Scheduling
Authors:
Matteo Dell'Amico,
Damiano Carra,
Pietro Michiardi
Abstract:
By executing jobs serially rather than in parallel, size-based scheduling policies can shorten time needed to complete jobs; however, major obstacles to their applicability are fairness guarantees and the fact that job sizes are rarely known exactly a-priori. Here, we introduce the Pri family of size-based scheduling policies; Pri simulates any reference scheduler and executes jobs in the order of…
▽ More
By executing jobs serially rather than in parallel, size-based scheduling policies can shorten time needed to complete jobs; however, major obstacles to their applicability are fairness guarantees and the fact that job sizes are rarely known exactly a-priori. Here, we introduce the Pri family of size-based scheduling policies; Pri simulates any reference scheduler and executes jobs in the order of their simulated completion: we show that these schedulers give strong fairness guarantees, since no job completes later in Pri than in the reference policy. In addition, we introduce PSBS, a practical implementation of such a scheduler: it works online (i.e., without needing knowledge of jobs submitted in the future), it has an efficient O(log n) implementation and it allows setting priorities to jobs. Most importantly, unlike earlier size-based policies, the performance of PSBS degrades gracefully with errors, leading to performances that are close to optimal in a variety of realistic use cases.
△ Less
Submitted 30 June, 2015;
originally announced June 2015.
-
PSBS: Practical Size-Based Scheduling
Authors:
Matteo Dell'Amico,
Damiano Carra,
Pietro Michiardi
Abstract:
Size-based schedulers have very desirable performance properties: optimal or near-optimal response time can be coupled with strong fairness guarantees. Despite this, such systems are very rarely implemented in practical settings, because they require knowing a priori the amount of work needed to complete jobs: this assumption is very difficult to satisfy in concrete systems. It is definitely more…
▽ More
Size-based schedulers have very desirable performance properties: optimal or near-optimal response time can be coupled with strong fairness guarantees. Despite this, such systems are very rarely implemented in practical settings, because they require knowing a priori the amount of work needed to complete jobs: this assumption is very difficult to satisfy in concrete systems. It is definitely more likely to inform the system with an estimate of the job sizes, but existing studies point to somewhat pessimistic results if existing scheduler policies are used based on imprecise job size estimations. We take the goal of designing scheduling policies that are explicitly designed to deal with inexact job sizes: first, we show that existing size-based schedulers can have bad performance with inexact job size information when job sizes are heavily skewed; we show that this issue, and the pessimistic results shown in the literature, are due to problematic behavior when large jobs are underestimated. Once the problem is identified, it is possible to amend existing size-based schedulers to solve the issue. We generalize FSP -- a fair and efficient size-based scheduling policy -- in order to solve the problem highlighted above; in addition, our solution deals with different job weights (that can be assigned to a job independently from its size). We provide an efficient implementation of the resulting protocol, which we call Practical Size-Based Scheduler (PSBS). Through simulations evaluated on synthetic and real workloads, we show that PSBS has near-optimal performance in a large variety of cases with inaccurate size information, that it performs fairly and it handles correctly job weights. We believe that this work shows that PSBS is indeed pratical, and we maintain that it could inspire the design of schedulers in a wide array of real-world use cases.
△ Less
Submitted 6 August, 2015; v1 submitted 22 October, 2014;
originally announced October 2014.
-
On User Availability Prediction and Network Applications
Authors:
Matteo Dell'Amico,
Maurizio Filippone,
Pietro Michiardi,
Yves Roudier
Abstract:
User connectivity patterns in network applications are known to be heterogeneous, and to follow periodic (daily and weekly) patterns. In many cases, the regularity and the correlation of those patterns is problematic: for network applications, many connected users create peaks of demand; in contrast, in peer-to-peer scenarios, having few users online results in a scarcity of available resources. O…
▽ More
User connectivity patterns in network applications are known to be heterogeneous, and to follow periodic (daily and weekly) patterns. In many cases, the regularity and the correlation of those patterns is problematic: for network applications, many connected users create peaks of demand; in contrast, in peer-to-peer scenarios, having few users online results in a scarcity of available resources. On the other hand, since connectivity patterns exhibit a periodic behavior, they are to some extent predictable. This work shows how this can be exploited to anticipate future user connectivity and to have applications proactively responding to it. We evaluate the probability that any given user will be online at any given time, and assess the prediction on six-month availability traces from three different Internet applications. Building upon this, we show how our probabilistic approach makes it easy to evaluate and optimize the performance in a number of diverse network application models, and to use them to optimize systems. In particular, we show how this approach can be used in distributed hash tables, friend-to-friend storage, and cache pre-loading for social networks, resulting in substantial gains in data availability and system efficiency at negligible costs.
△ Less
Submitted 30 April, 2014;
originally announced April 2014.
-
Revisiting Size-Based Scheduling with Estimated Job Sizes
Authors:
Matteo Dell'Amico,
Damiano Carra,
Mario Pastorelli,
Pietro Michiardi
Abstract:
We study size-based schedulers, and focus on the impact of inaccurate job size information on response time and fairness. Our intent is to revisit previous results, which allude to performance degradation for even small errors on job size estimates, thus limiting the applicability of size-based schedulers.
We show that scheduling performance is tightly connected to workload characteristics: in t…
▽ More
We study size-based schedulers, and focus on the impact of inaccurate job size information on response time and fairness. Our intent is to revisit previous results, which allude to performance degradation for even small errors on job size estimates, thus limiting the applicability of size-based schedulers.
We show that scheduling performance is tightly connected to workload characteristics: in the absence of large skew in the job size distribution, even extremely imprecise estimates suffice to outperform size-oblivious disciplines. Instead, when job sizes are heavily skewed, known size-based disciplines suffer.
In this context, we show -- for the first time -- the dichotomy of over-estimation versus under-estimation. The former is, in general, less problematic than the latter, as its effects are localized to individual jobs. Instead, under-estimation leads to severe problems that may affect a large number of jobs.
We present an approach to mitigate these problems: our technique requires no complex modifications to original scheduling policies and performs very well. To support our claim, we proceed with a simulation-based evaluation that covers an unprecedented large parameter space, which takes into account a variety of synthetic and real workloads.
As a consequence, we show that size-based scheduling is practical and outperforms alternatives in a wide array of use-cases, even in presence of inaccurate size information.
△ Less
Submitted 25 July, 2014; v1 submitted 24 March, 2014;
originally announced March 2014.
-
OS-Assisted Task Preemption for Hadoop
Authors:
Mario Pastorelli,
Matteo Dell'Amico,
Pietro Michiardi
Abstract:
This work introduces a new task preemption primitive for Hadoop, that allows tasks to be suspended and resumed exploiting existing memory management mechanisms readily available in modern operating systems. Our technique fills the gap that exists between the two extremes cases of killing tasks (which waste work) or waiting for their completion (which introduces latency): experimental results indic…
▽ More
This work introduces a new task preemption primitive for Hadoop, that allows tasks to be suspended and resumed exploiting existing memory management mechanisms readily available in modern operating systems. Our technique fills the gap that exists between the two extremes cases of killing tasks (which waste work) or waiting for their completion (which introduces latency): experimental results indicate superior performance and very small overheads when compared to existing alternatives.
△ Less
Submitted 10 February, 2014;
originally announced February 2014.
-
A Simulator for Data-Intensive Job Scheduling
Authors:
Matteo Dell'Amico
Abstract:
Despite the fact that size-based schedulers can give excellent results in terms of both average response times and fairness, data-intensive computing execution engines generally do not employ size-based schedulers, mainly because of the fact that job size is not known a priori.
In this work, we perform a simulation-based analysis of the performance of size-based schedulers when they are employed…
▽ More
Despite the fact that size-based schedulers can give excellent results in terms of both average response times and fairness, data-intensive computing execution engines generally do not employ size-based schedulers, mainly because of the fact that job size is not known a priori.
In this work, we perform a simulation-based analysis of the performance of size-based schedulers when they are employed with the workload of typical data-intensive schedules and with approximated size estimations. We show results that are very promising: even when size estimation is very imprecise, response times of size-based schedulers can be definitely smaller than those of simple scheduling techniques such as processor sharing or FIFO.
△ Less
Submitted 21 August, 2013; v1 submitted 25 June, 2013;
originally announced June 2013.
-
Practical Size-based Scheduling for MapReduce Workloads
Authors:
Mario Pastorelli,
Antonio Barbuzzi,
Damiano Carra,
Matteo Dell'Amico,
Pietro Michiardi
Abstract:
We present the Hadoop Fair Sojourn Protocol (HFSP) scheduler, which implements a size-based scheduling discipline for Hadoop. The benefits of size-based scheduling disciplines are well recognized in a variety of contexts (computer networks, operating systems, etc...), yet, their practical implementation for a system such as Hadoop raises a number of important challenges. With HFSP, which is availa…
▽ More
We present the Hadoop Fair Sojourn Protocol (HFSP) scheduler, which implements a size-based scheduling discipline for Hadoop. The benefits of size-based scheduling disciplines are well recognized in a variety of contexts (computer networks, operating systems, etc...), yet, their practical implementation for a system such as Hadoop raises a number of important challenges. With HFSP, which is available as an open-source project, we address issues related to job size estimation, resource management and study the effects of a variety of preemption strategies. Although the architecture underlying HFSP is suitable for any size-based scheduling discipline, in this work we revisit and extend the Fair Sojourn Protocol, which solves problems related to job starvation that affect FIFO, Processor Sharing and a range of size-based disciplines. Our experiments, in which we compare HFSP to standard Hadoop schedulers, pinpoint at a significant decrease in average job sojourn times - a metric that accounts for the total time a job spends in the system, including waiting and serving times - for realistic workloads that we generate according to production traces available in literature.
△ Less
Submitted 3 May, 2013; v1 submitted 12 February, 2013;
originally announced February 2013.
-
Adaptive Redundancy Management for Durable P2P Backup
Authors:
Matteo Dell'Amico,
Pietro Michiardi,
Laszlo Toka,
Pasquale Cataldi
Abstract:
We design and analyze the performance of a redundancy management mechanism for Peer-to-Peer backup applications. Armed with the realization that a backup system has peculiar requirements -- namely, data is read over the network only during restore processes caused by data loss -- redundancy management targets data durability rather than attempting to make each piece of information availabile at an…
▽ More
We design and analyze the performance of a redundancy management mechanism for Peer-to-Peer backup applications. Armed with the realization that a backup system has peculiar requirements -- namely, data is read over the network only during restore processes caused by data loss -- redundancy management targets data durability rather than attempting to make each piece of information availabile at any time.
In our approach each peer determines, in an on-line manner, an amount of redundancy sufficient to counter the effects of peer deaths, while preserving acceptable data restore times. Our experiments, based on trace-driven simulations, indicate that our mechanism can reduce the redundancy by a factor between two and three with respect to redundancy policies aiming for data availability. These results imply an according increase in storage capacity and decrease in time to complete backups, at the expense of longer times required to restore data. We believe this is a very reasonable price to pay, given the nature of the application.
We complete our work with a discussion on practical issues, and their solutions, related to which encoding technique is more suitable to support our scheme.
△ Less
Submitted 17 January, 2014; v1 submitted 11 January, 2012;
originally announced January 2012.
-
Back To The Future: On Predicting User Uptime
Authors:
Matteo Dell'Amico,
Pietro Michiardi,
Yves Roudier
Abstract:
Correlation in user connectivity patterns is generally considered a problem for system designers, since it results in peaks of demand and also in the scarcity of resources for peer-to-peer applications. The other side of the coin is that these connectivity patterns are often predictable and that, to some extent, they can be dealt with proactively.
In this work, we build predictors aiming to dete…
▽ More
Correlation in user connectivity patterns is generally considered a problem for system designers, since it results in peaks of demand and also in the scarcity of resources for peer-to-peer applications. The other side of the coin is that these connectivity patterns are often predictable and that, to some extent, they can be dealt with proactively.
In this work, we build predictors aiming to determine the probability that any given user will be online at any given time in the future. We evaluate the quality of these predictors on various large traces from instant messaging and file sharing applications.
We also illustrate how availability prediction can be applied to enhance the behavior of peer-to-peer applications: we show through simulation how data availability is substantially increased in a distributed hash table simply by adjusting data placement policies according to peer availability prediction and without requiring any additional storage from any peer.
△ Less
Submitted 4 October, 2010;
originally announced October 2010.
-
On Scheduling and Redundancy for P2P Backup
Authors:
Laszlo Toka,
Matteo Dell'Amico,
Pietro Michiardi
Abstract:
An online backup system should be quick and reliable in both saving and restoring users' data. To do so in a peer-to-peer implementation, data transfer scheduling and the amount of redundancy must be chosen wisely. We formalize the problem of exchanging multiple pieces of data with intermittently available peers, and we show that random scheduling completes transfers nearly optimally in terms of…
▽ More
An online backup system should be quick and reliable in both saving and restoring users' data. To do so in a peer-to-peer implementation, data transfer scheduling and the amount of redundancy must be chosen wisely. We formalize the problem of exchanging multiple pieces of data with intermittently available peers, and we show that random scheduling completes transfers nearly optimally in terms of duration as long as the system is sufficiently large. Moreover, we propose an adaptive redundancy scheme that improves performance and decreases resource usage while kee** the risks of data loss low. Extensive simulations show that our techniques are effective in a realistic trace-driven scenario with heterogeneous bandwidth.
△ Less
Submitted 7 September, 2010;
originally announced September 2010.
-
Measuring Password Strength: An Empirical Analysis
Authors:
Matteo Dell'Amico,
Pietro Michiardi,
Yves Roudier
Abstract:
We present an in-depth analysis on the strength of the almost 10,000 passwords from users of an instant messaging server in Italy. We estimate the strength of those passwords, and compare the effectiveness of state-of-the-art attack methods such as dictionaries and Markov chain-based techniques.
We show that the strength of passwords chosen by users varies enormously, and that the cost of atta…
▽ More
We present an in-depth analysis on the strength of the almost 10,000 passwords from users of an instant messaging server in Italy. We estimate the strength of those passwords, and compare the effectiveness of state-of-the-art attack methods such as dictionaries and Markov chain-based techniques.
We show that the strength of passwords chosen by users varies enormously, and that the cost of attacks based on password strength grows very quickly when the attacker wants to obtain a higher success percentage. In accordance with existing studies we observe that, in the absence of measures for enforcing password strength, weak passwords are common. On the other hand we discover that there will always be a subset of users with extremely strong passwords that are very unlikely to be broken.
The results of our study will help in evaluating the security of password-based authentication means, and they provide important insights for inspiring new and better proactive password checkers and password recovery tools.
△ Less
Submitted 20 July, 2009;
originally announced July 2009.