Search | arXiv e-print repository

The Web unpacked: a quantitative analysis of global Web usage

Abstract: This paper presents a comprehensive analysis of global web usage patterns based on data from SimilarWeb, a leading source for estimating web traffic. Leveraging a dataset comprising over 250,000 websites, we estimate the total web traffic and investigate its distribution among domains and industry sectors. We detail the characteristics of the top 116 domains, which comprise an estimated one-third… ▽ More This paper presents a comprehensive analysis of global web usage patterns based on data from SimilarWeb, a leading source for estimating web traffic. Leveraging a dataset comprising over 250,000 websites, we estimate the total web traffic and investigate its distribution among domains and industry sectors. We detail the characteristics of the top 116 domains, which comprise an estimated one-third of all web traffic. Our analysis scrutinizes various attributes of these domains, including their content sources and types, access requirements, offline presence, and ownership features. Our analysis reveals a significant concentration of web traffic, with a diminutive number of top websites capturing the majority of visits. Search engines, news and media, social networks, streaming, and adult content emerge as primary attractors of web traffic, which is also highly concentrated on platforms and USA-owned websites. Much of the traffic goes to for-profit but mostly free-of-charge websites, highlighting the dominance of business models not based on paywalls. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 12 pages, 10 figures, 3 tables

arXiv:2311.05330 [pdf, other]

doi 10.1145/3589335.3651911

A Bayesian framework for measuring association and its application to emotional dynamics in Web discourse

Authors: Henrique S. Xavier, Diogo Cortiz, Mateus Silvestrin, Ana Luísa Freitas, Letícia Yumi Nakao Morello, Fernanda Naomi Pantaleão, Gabriel Gaudencio do Rêgo

Abstract: This paper introduces a Bayesian framework designed to measure the degree of association between categorical random variables. The method is grounded in the formal definition of variable independence and is implemented using Markov Chain Monte Carlo (MCMC) techniques. Unlike commonly employed techniques in Association Rule Learning, this approach enables a clear and precise estimation of confidenc… ▽ More This paper introduces a Bayesian framework designed to measure the degree of association between categorical random variables. The method is grounded in the formal definition of variable independence and is implemented using Markov Chain Monte Carlo (MCMC) techniques. Unlike commonly employed techniques in Association Rule Learning, this approach enables a clear and precise estimation of confidence intervals and the statistical significance of the measured degree of association. We applied the method to non-exclusive emotions identified by annotators in 4,613 tweets written in Portuguese. This analysis revealed pairs of emotions that exhibit associations and mutually opposed pairs. Moreover, the method identifies hierarchical relations between categories, a feature observed in our data, and is utilized to cluster emotions into basic-level groups. △ Less

Submitted 11 March, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: 9 pages, 2 tables, 4 figures. Accepted for publication at the Beyond Facts workshop of the Web Conference 2024

arXiv:2310.03258 [pdf, other]

Assessing Electricity Service Unfairness with Transfer Counterfactual Learning

Authors: Song Wei, Xiangrui Kong, Alinson Santos Xavier, Shixiang Zhu, Yao Xie, Feng Qiu

Abstract: Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in counterfactual effects, and limited data availability. First, this paper demonstrates how one can evaluate counterfactual unfairness in a power system by analyzing the average caus… ▽ More Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in counterfactual effects, and limited data availability. First, this paper demonstrates how one can evaluate counterfactual unfairness in a power system by analyzing the average causal effect of a specific protected attribute. Subsequently, we use subgroup analysis to handle model heterogeneity and introduce a novel method for estimating counterfactual unfairness based on transfer learning, which helps to alleviate the data scarcity in each subgroup. In our numerical analysis, we apply our method to a unique large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages under both daily and post-disaster operations, and such discrimination is exacerbated under severe conditions. These findings suggest a widespread, systematic issue of injustice in the power service systems and emphasize the necessity for focused interventions in disadvantaged communities. △ Less

Submitted 24 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: The preliminary version titled "Detecting Electricity Service Equity Issues with Transfer Counterfactual Learning on Large-Scale Outage Datasets" is presented at NeurIPS 2023 Workshops on Causal Representation Learning (CRL) and Algorithmic Fairness through the Lens of Time (AFT); See v1

arXiv:2301.04696 [pdf, other]

doi 10.5281/zenodo.7513695

On Modeling Network Slicing Communication Resources with SARSA Optimization

Authors: Eduardo S. Xavier, Nazim Agoulmine, Joberto S. B. Martins

Abstract: Network slicing is a crucial enabler to support the composition and deployment of virtual network infrastructures required by the dynamic behavior of networks like 5G/6G mobile networks, IoT-aware networks, e-health systems, and industry verticals like the internet of vehicles (IoV) and industry 4.0. The communication slices and their allocated communication resources are essential in slicing arch… ▽ More Network slicing is a crucial enabler to support the composition and deployment of virtual network infrastructures required by the dynamic behavior of networks like 5G/6G mobile networks, IoT-aware networks, e-health systems, and industry verticals like the internet of vehicles (IoV) and industry 4.0. The communication slices and their allocated communication resources are essential in slicing architectures for resource orchestration and allocation, virtual network function (VNF) deployment, and slice operation functionalities. The communication slices provide the communications capabilities required to support slice operation, SLA guarantees, and QoS/ QoE application requirements. Therefore, this contribution proposes a networking slicing conceptual model to formulate the optimization problem related to the sharing of communication resources among communication slices. First, we present a conceptual model of network slicing, we then formulate analytically some aspects of the model and the optimization problem to address. Next, we proposed to use a SARSA agent to solve the problem and implement a proof of concept prototype. Finally, we present the obtained results and discuss them. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Comments: 8 pages, 9 figures, ADVANCE conference paper

ACM Class: C.2; F.1.1; I.2

arXiv:1902.01697 [pdf, other]

Learning to Solve Large-Scale Security-Constrained Unit Commitment Problems

Authors: Alinson S. Xavier, Feng Qiu, Shabbir Ahmed

Abstract: Security-Constrained Unit Commitment (SCUC) is a fundamental problem in power systems and electricity markets. In practical settings, SCUC is repeatedly solved via Mixed-Integer Linear Programming, sometimes multiple times per day, with only minor changes in input data. In this work, we propose a number of machine learning (ML) techniques to effectively extract information from previously solved i… ▽ More Security-Constrained Unit Commitment (SCUC) is a fundamental problem in power systems and electricity markets. In practical settings, SCUC is repeatedly solved via Mixed-Integer Linear Programming, sometimes multiple times per day, with only minor changes in input data. In this work, we propose a number of machine learning (ML) techniques to effectively extract information from previously solved instances in order to significantly improve the computational performance of MIP solvers when solving similar instances in the future. Based on statistical data, we predict redundant constraints in the formulation, good initial feasible solutions and affine subspaces where the optimal solution is likely to lie, leading to significant reduction in problem size. Computational results on a diverse set of realistic and large-scale instances show that, using the proposed techniques, SCUC can be solved on average 4.3x faster with optimality guarantees, and 10.2x faster without optimality guarantees, but with no observed reduction in solution quality. Out-of-distribution experiments provides evidence that the method is somewhat robust against dataset shift. △ Less

Submitted 18 December, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

arXiv:1810.12140 [pdf, ps, other]

A Convergence indicator for Multi-Objective Optimisation Algorithms

Authors: Thiago Santos, Sebastiao Xavier

Abstract: The algorithms of multi-objective optimisation had a relative growth in the last years. Thereby, it's requires some way of comparing the results of these. In this sense, performance measures play a key role. In general, it's considered some properties of these algorithms such as capacity, convergence, diversity or convergence-diversity. There are some known measures such as generational distance (… ▽ More The algorithms of multi-objective optimisation had a relative growth in the last years. Thereby, it's requires some way of comparing the results of these. In this sense, performance measures play a key role. In general, it's considered some properties of these algorithms such as capacity, convergence, diversity or convergence-diversity. There are some known measures such as generational distance (GD), inverted generational distance (IGD), hypervolume (HV), Spread($Δ$), Averaged Hausdorff distance ($Δ_p$), R2-indicator, among others. In this paper, we focuses on proposing a new indicator to measure convergence based on the traditional formula for Shannon entropy. The main features about this measure are: 1) It does not require tho know the true Pareto set and 2) Medium computational cost when compared with Hypervolume. △ Less

Submitted 29 October, 2018; originally announced October 2018.

Comments: Submitted to TEMA

Showing 1–6 of 6 results for author: Xavier, S