-
The Web unpacked: a quantitative analysis of global Web usage
Authors:
Henrique S. Xavier
Abstract:
This paper presents a comprehensive analysis of global web usage patterns based on data from SimilarWeb, a leading source for estimating web traffic. Leveraging a dataset comprising over 250,000 websites, we estimate the total web traffic and investigate its distribution among domains and industry sectors. We detail the characteristics of the top 116 domains, which comprise an estimated one-third…
▽ More
This paper presents a comprehensive analysis of global web usage patterns based on data from SimilarWeb, a leading source for estimating web traffic. Leveraging a dataset comprising over 250,000 websites, we estimate the total web traffic and investigate its distribution among domains and industry sectors. We detail the characteristics of the top 116 domains, which comprise an estimated one-third of all web traffic. Our analysis scrutinizes various attributes of these domains, including their content sources and types, access requirements, offline presence, and ownership features. Our analysis reveals a significant concentration of web traffic, with a diminutive number of top websites capturing the majority of visits. Search engines, news and media, social networks, streaming, and adult content emerge as primary attractors of web traffic, which is also highly concentrated on platforms and USA-owned websites. Much of the traffic goes to for-profit but mostly free-of-charge websites, highlighting the dominance of business models not based on paywalls.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
A Bayesian framework for measuring association and its application to emotional dynamics in Web discourse
Authors:
Henrique S. Xavier,
Diogo Cortiz,
Mateus Silvestrin,
Ana Luísa Freitas,
Letícia Yumi Nakao Morello,
Fernanda Naomi Pantaleão,
Gabriel Gaudencio do Rêgo
Abstract:
This paper introduces a Bayesian framework designed to measure the degree of association between categorical random variables. The method is grounded in the formal definition of variable independence and is implemented using Markov Chain Monte Carlo (MCMC) techniques. Unlike commonly employed techniques in Association Rule Learning, this approach enables a clear and precise estimation of confidenc…
▽ More
This paper introduces a Bayesian framework designed to measure the degree of association between categorical random variables. The method is grounded in the formal definition of variable independence and is implemented using Markov Chain Monte Carlo (MCMC) techniques. Unlike commonly employed techniques in Association Rule Learning, this approach enables a clear and precise estimation of confidence intervals and the statistical significance of the measured degree of association. We applied the method to non-exclusive emotions identified by annotators in 4,613 tweets written in Portuguese. This analysis revealed pairs of emotions that exhibit associations and mutually opposed pairs. Moreover, the method identifies hierarchical relations between categories, a feature observed in our data, and is utilized to cluster emotions into basic-level groups.
△ Less
Submitted 11 March, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
Assessing Electricity Service Unfairness with Transfer Counterfactual Learning
Authors:
Song Wei,
Xiangrui Kong,
Alinson Santos Xavier,
Shixiang Zhu,
Yao Xie,
Feng Qiu
Abstract:
Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in counterfactual effects, and limited data availability. First, this paper demonstrates how one can evaluate counterfactual unfairness in a power system by analyzing the average caus…
▽ More
Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in counterfactual effects, and limited data availability. First, this paper demonstrates how one can evaluate counterfactual unfairness in a power system by analyzing the average causal effect of a specific protected attribute. Subsequently, we use subgroup analysis to handle model heterogeneity and introduce a novel method for estimating counterfactual unfairness based on transfer learning, which helps to alleviate the data scarcity in each subgroup. In our numerical analysis, we apply our method to a unique large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages under both daily and post-disaster operations, and such discrimination is exacerbated under severe conditions. These findings suggest a widespread, systematic issue of injustice in the power service systems and emphasize the necessity for focused interventions in disadvantaged communities.
△ Less
Submitted 24 January, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
On Modeling Network Slicing Communication Resources with SARSA Optimization
Authors:
Eduardo S. Xavier,
Nazim Agoulmine,
Joberto S. B. Martins
Abstract:
Network slicing is a crucial enabler to support the composition and deployment of virtual network infrastructures required by the dynamic behavior of networks like 5G/6G mobile networks, IoT-aware networks, e-health systems, and industry verticals like the internet of vehicles (IoV) and industry 4.0. The communication slices and their allocated communication resources are essential in slicing arch…
▽ More
Network slicing is a crucial enabler to support the composition and deployment of virtual network infrastructures required by the dynamic behavior of networks like 5G/6G mobile networks, IoT-aware networks, e-health systems, and industry verticals like the internet of vehicles (IoV) and industry 4.0. The communication slices and their allocated communication resources are essential in slicing architectures for resource orchestration and allocation, virtual network function (VNF) deployment, and slice operation functionalities. The communication slices provide the communications capabilities required to support slice operation, SLA guarantees, and QoS/ QoE application requirements. Therefore, this contribution proposes a networking slicing conceptual model to formulate the optimization problem related to the sharing of communication resources among communication slices. First, we present a conceptual model of network slicing, we then formulate analytically some aspects of the model and the optimization problem to address. Next, we proposed to use a SARSA agent to solve the problem and implement a proof of concept prototype. Finally, we present the obtained results and discuss them.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Learning to Solve Large-Scale Security-Constrained Unit Commitment Problems
Authors:
Alinson S. Xavier,
Feng Qiu,
Shabbir Ahmed
Abstract:
Security-Constrained Unit Commitment (SCUC) is a fundamental problem in power systems and electricity markets. In practical settings, SCUC is repeatedly solved via Mixed-Integer Linear Programming, sometimes multiple times per day, with only minor changes in input data. In this work, we propose a number of machine learning (ML) techniques to effectively extract information from previously solved i…
▽ More
Security-Constrained Unit Commitment (SCUC) is a fundamental problem in power systems and electricity markets. In practical settings, SCUC is repeatedly solved via Mixed-Integer Linear Programming, sometimes multiple times per day, with only minor changes in input data. In this work, we propose a number of machine learning (ML) techniques to effectively extract information from previously solved instances in order to significantly improve the computational performance of MIP solvers when solving similar instances in the future. Based on statistical data, we predict redundant constraints in the formulation, good initial feasible solutions and affine subspaces where the optimal solution is likely to lie, leading to significant reduction in problem size. Computational results on a diverse set of realistic and large-scale instances show that, using the proposed techniques, SCUC can be solved on average 4.3x faster with optimality guarantees, and 10.2x faster without optimality guarantees, but with no observed reduction in solution quality. Out-of-distribution experiments provides evidence that the method is somewhat robust against dataset shift.
△ Less
Submitted 18 December, 2019; v1 submitted 4 February, 2019;
originally announced February 2019.
-
A Convergence indicator for Multi-Objective Optimisation Algorithms
Authors:
Thiago Santos,
Sebastiao Xavier
Abstract:
The algorithms of multi-objective optimisation had a relative growth in the last years. Thereby, it's requires some way of comparing the results of these. In this sense, performance measures play a key role. In general, it's considered some properties of these algorithms such as capacity, convergence, diversity or convergence-diversity. There are some known measures such as generational distance (…
▽ More
The algorithms of multi-objective optimisation had a relative growth in the last years. Thereby, it's requires some way of comparing the results of these. In this sense, performance measures play a key role. In general, it's considered some properties of these algorithms such as capacity, convergence, diversity or convergence-diversity. There are some known measures such as generational distance (GD), inverted generational distance (IGD), hypervolume (HV), Spread($Δ$), Averaged Hausdorff distance ($Δ_p$), R2-indicator, among others. In this paper, we focuses on proposing a new indicator to measure convergence based on the traditional formula for Shannon entropy. The main features about this measure are: 1) It does not require tho know the true Pareto set and 2) Medium computational cost when compared with Hypervolume.
△ Less
Submitted 29 October, 2018;
originally announced October 2018.