-
BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction
Authors:
Dong Wang,
Kavé Salamatian,
Yunqing Xia,
Weiwei Deng,
Qi Zhiang
Abstract:
Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging.
Up to now two directions have been explored to integrate multi-modal inputs in fine-tuning of…
▽ More
Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging.
Up to now two directions have been explored to integrate multi-modal inputs in fine-tuning of pre-trained language models. One consists of fusing the outcome of language models and non-textual features through an aggregation layer, resulting into ensemble framework, where the cross-information between textual and non-textual inputs are only learned in the aggregation layer. The second one consists of splitting non-textual features into fine-grained fragments and transforming the fragments to new tokens combined with textual ones, so that they can be fed directly to transformer layers in language models. However, this approach increases the complexity of the learning and inference because of the numerous additional tokens.
To address these limitations, we propose in this work a novel framework BERT4CTR, with the Uni-Attention mechanism that can benefit from the interactions between non-textual and textual features while maintaining low time-costs in training and inference through a dimensionality reduction. Comprehensive experiments on both public and commercial data demonstrate that BERT4CTR can outperform significantly the state-of-the-art frameworks to handle multi-modal inputs and be applicable to CTR prediction.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Wide-AdGraph: Detecting Ad Trackers with a Wide Dependency Chain Graph
Authors:
Amir Hossein Kargaran,
Mohammad Sadegh Akhondzadeh,
Mohammad Reza Heidarpour,
Mohammad Hossein Manshaei,
Kave Salamatian,
Masoud Nejad Sattary
Abstract:
Websites use third-party ads and tracking services to deliver targeted ads and collect information about users that visit them. These services put users' privacy at risk, and that is why users' demand for blocking these services is growing. Most of the blocking solutions rely on crowd-sourced filter lists manually maintained by a large community of users. In this work, we seek to simplify the upda…
▽ More
Websites use third-party ads and tracking services to deliver targeted ads and collect information about users that visit them. These services put users' privacy at risk, and that is why users' demand for blocking these services is growing. Most of the blocking solutions rely on crowd-sourced filter lists manually maintained by a large community of users. In this work, we seek to simplify the update of these filter lists by combining different websites through a large-scale graph connecting all resource requests made over a large set of sites. The features of this graph are extracted and used to train a machine learning algorithm with the aim of detecting ads and tracking resources. As our approach combines different information sources, it is more robust toward evasion techniques that use obfuscation or changing the usage patterns. We evaluate our work over the Alexa top-10K websites and find its accuracy to be 96.1% biased and 90.9% unbiased with high precision and recall. It can also block new ads and tracking services, which would necessitate being blocked by further crowd-sourced existing filter lists. Moreover, the approach followed in this paper sheds light on the ecosystem of third-party tracking and advertising.
△ Less
Submitted 10 May, 2021; v1 submitted 29 April, 2020;
originally announced April 2020.
-
The geopolitics behind the routes data travels: a case study of Iran
Authors:
Loqman Salamatian,
Frederick Douzet,
Kevin Limonier,
Kavé Salamatian
Abstract:
The global expansion of the Internet has brought many challenges to geopolitics. Cyberspace is a space of strategic priority for many states. Understanding and representing its geography remains an ongoing challenge. Nevertheless, we need to comprehend Cyberspace as a space organized by humans to analyse the strategies of the actors. This geography requires a multidisciplinary dialogue associating…
▽ More
The global expansion of the Internet has brought many challenges to geopolitics. Cyberspace is a space of strategic priority for many states. Understanding and representing its geography remains an ongoing challenge. Nevertheless, we need to comprehend Cyberspace as a space organized by humans to analyse the strategies of the actors. This geography requires a multidisciplinary dialogue associating geopolitics, computer science and mathematics. Cyberspace is represented as three superposed and interacting layers: the physical, logical, and informational layers. This paper focuses on the logical layer through an analysis of the structure of connectivity and the Border Gateway Protocol (BGP). This protocol determines the routes taken by the data. It has been leveraged by countries to control the flow of information, and to block the access to contents (going up to full disruption of the internet) or for active strategic purposes such as hijacking traffic or attacking infrastructures. Several countries have opted for a BGP strategy. The goal of this study is to characterize these strategies, to link them to current architectures and to understand their resilience in times of crisis. Our hypothesis is that there are connections between the network architecture shaped through BGP, and strategy of stakeholders at a national level. We chose to focus on the case of Iran because, Iran presents an interesting BGP architecture and holds a central position in the connectivity of the Middle East. Moreover, Iran is at the center of several ongoing geopolitical rifts. Our observations make it possible to infer three ways in which Iran could have used BGP to achieve its strategic goals: the pursuit of a self-sustaining national Internet with controlled borders; the will to set up an Iranian Intranet to facilitate censorship; and the leverage of connectivity as a tool of regional influence.
△ Less
Submitted 19 November, 2019; v1 submitted 18 November, 2019;
originally announced November 2019.
-
Pull-based Bloom Filter-based Routing for Information-Centric Networks
Authors:
Ali Marandi,
Torsten Braun,
Kave Salamatian,
Nikolaos Thomos
Abstract:
In Named Data Networking (NDN), there is a need for routing protocols to populate Forwarding Information Base (FIB) tables so that the Interest messages can be forwarded. To populate FIBs, clients and routers require some routing information. One method to obtain this information is that network nodes exchange routing information by each node advertising the available content objects. Bloom Filter…
▽ More
In Named Data Networking (NDN), there is a need for routing protocols to populate Forwarding Information Base (FIB) tables so that the Interest messages can be forwarded. To populate FIBs, clients and routers require some routing information. One method to obtain this information is that network nodes exchange routing information by each node advertising the available content objects. Bloom Filter-based Routing approaches like BFR [1], use Bloom Filters (BFs) to advertise all provided content objects, which consumes valuable bandwidth and storage resources. This strategy is inefficient as clients request only a small number of the provided content objects and they do not need the content advertisement information for all provided content objects. In this paper, we propose a novel routing algorithm for NDN called pull-based BFR in which servers only advertise the demanded file names. We compare the performance of pull-based BFR with original BFR and with a flooding-assisted routing protocol. Our experimental evaluations show that pull-based BFR outperforms original BFR in terms of communication overhead needed for content advertisements, average roundtrip delay, memory resources needed for storing content advertisements at clients and routers, and the impact of false positive reports on routing. The comparisons also show that pull-based BFR outperforms flooding-assisted routing in terms of average round-trip delay.
△ Less
Submitted 28 September, 2018;
originally announced September 2018.
-
A Geometric Approach for Real-time Monitoring of Dynamic Large Scale Graphs: AS-level graphs illustrated
Authors:
Loqman Salamatian,
Dali Kaafar,
Kavé Salamatian
Abstract:
The monitoring of large dynamic networks is a major chal- lenge for a wide range of application. The complexity stems from properties of the underlying graphs, in which slight local changes can lead to sizable variations of global prop- erties, e.g., under certain conditions, a single link cut that may be overlooked during monitoring can result in splitting the graph into two disconnected componen…
▽ More
The monitoring of large dynamic networks is a major chal- lenge for a wide range of application. The complexity stems from properties of the underlying graphs, in which slight local changes can lead to sizable variations of global prop- erties, e.g., under certain conditions, a single link cut that may be overlooked during monitoring can result in splitting the graph into two disconnected components. Moreover, it is often difficult to determine whether a change will propagate globally or remain local. Traditional graph theory measure such as the centrality or the assortativity of the graph are not satisfying to characterize global properties of the graph. In this paper, we tackle the problem of real-time monitoring of dynamic large scale graphs by develo** a geometric approach that leverages notions of geometric curvature and recent development in graph embeddings using Ollivier-Ricci curvature [47]. We illustrate the use of our method by consid- ering the practical case of monitoring dynamic variations of global Internet using topology changes information provided by combining several BGP feeds. In particular, we use our method to detect major events and changes via the geometry of the embedding of the graph.
△ Less
Submitted 2 June, 2018;
originally announced June 2018.
-
BFR: a Bloom Filter-based Routing Approach for Information-Centric Networks
Authors:
Ali Marandi,
Torsten Braun,
Kave Salamatian,
Nikolaos Thomos
Abstract:
Locating the demanded content is one of the major challenges in Information-Centric Networking (ICN). This process is known as content discovery. To facilitate content discovery, in this paper we focus on Named Data Networking (NDN) and propose a novel routing scheme for content discovery, called Bloom Filter-based Routing (BFR), which is fully distributed, content oriented, and topology agnostic…
▽ More
Locating the demanded content is one of the major challenges in Information-Centric Networking (ICN). This process is known as content discovery. To facilitate content discovery, in this paper we focus on Named Data Networking (NDN) and propose a novel routing scheme for content discovery, called Bloom Filter-based Routing (BFR), which is fully distributed, content oriented, and topology agnostic at the intra-domain level. In BFR, origin servers advertise their content objects using Bloom filters. We compare the performance of the proposed BFR with flooding and shortest path content discovery approaches. BFR outperforms its counterparts in terms of the average round-trip delay, while it is shown to be very robust to false positive reports from Bloom filters. Also, BFR is much more robust than shortest path routing to topology changes. BFR strongly outperforms flooding and performs almost equal with shortest path routing with respect to the normalized communication costs for data retrieval and total communication overhead for forwarding Interests. All the three approaches achieve similar mean hit distance. The signalling overhead for content advertisement in BFR is much lower than the signalling overhead for calculating shortest paths in the shortest path approach. Finally, BFR requires small storage overhead for maintaining content advertisements.
△ Less
Submitted 1 February, 2017;
originally announced February 2017.
-
Optimization of Bloom Filter Parameters for Practical Bloom Filter Based Epidemic Forwarding in DTNs
Authors:
Ali Marandi,
Mahdi Faghi Imani,
Kave Salamatian
Abstract:
Epidemic forwarding has been proposed as a forwarding technique to achieve opportunistic communication in Delay Tolerant Networks. Even if this technique is well known and widely referred, one has to first deal with several practical problems before using it. In particular, in order to manage the redundancy and to avoid useless transmissions, it has been proposed to ask nodes to exchange informati…
▽ More
Epidemic forwarding has been proposed as a forwarding technique to achieve opportunistic communication in Delay Tolerant Networks. Even if this technique is well known and widely referred, one has to first deal with several practical problems before using it. In particular, in order to manage the redundancy and to avoid useless transmissions, it has been proposed to ask nodes to exchange information about the buffer content prior to sending information. While Bloom filter has been proposed to transport the buffer content information, up to our knowledge no real evaluation has been provided to study the tradeoff that exists in practice. In this paper we describe an implementation of an epidemic forwarding scheme using Bloom filters. Then we propose some strategies for Bloom filter management based on windowing and describe implementation tradeoffs. By simulating our proposed strategies in ns-3 both with random waypoint mobility and realistic mobility traces coming from San Francisco taxicabs, we show that our proposed strategies alleviate the challenge of using epidemic forwarding in DTNs.
△ Less
Submitted 19 August, 2012;
originally announced August 2012.
-
Characterization of P2P IPTV Traffic: Scaling Analysis
Authors:
Thomas Silverston,
Olivier Fourmaux,
Kave Salamatian
Abstract:
P2P IPTV applications arise on the Internet and will be massively used in the future. It is expected that P2P IPTV will contribute to increase the overall Internet traffic. In this context, it is important to measure the impact of P2P IPTV on the networks and to characterize this traffic. Dur- ing the 2006 FIFA World Cup, we performed an extensive measurement campaign. We measured network traffi…
▽ More
P2P IPTV applications arise on the Internet and will be massively used in the future. It is expected that P2P IPTV will contribute to increase the overall Internet traffic. In this context, it is important to measure the impact of P2P IPTV on the networks and to characterize this traffic. Dur- ing the 2006 FIFA World Cup, we performed an extensive measurement campaign. We measured network traffic generated by broadcasting soc- cer games by the most popular P2P IPTV applications, namely PPLive, PPStream, SOPCast and TVAnts. From the collected data, we charac- terized the P2P IPTV traffic structure at different time scales by using wavelet based transform method. To the best of our knowledge, this is the first work, which presents a complete multiscale analysis of the P2P IPTV traffic. Our results show that the scaling properties of the TCP traffic present periodic behavior whereas the UDP traffic is stationary and lead to long- range depedency characteristics. For all the applications, the download traffic has different characteristics than the upload traffic. The signaling traffic has a significant impact on the download traffic but it has negligible impact on the upload. Both sides of the traffic and its granularity has to be taken into account to design accurate P2P IPTV traffic models.
△ Less
Submitted 17 July, 2007; v1 submitted 24 April, 2007;
originally announced April 2007.
-
Describing and Simulating Internet Routes
Authors:
Jeremie Leguay,
Matthieu Latapy,
Timur Friedman,
Kave Salamatian
Abstract:
This paper introduces relevant statistics for the description of routes in the internet, seen as a graph at the interface level. Based on the observed properties, we propose and evaluate methods for generating artificial routes suitable for simulation purposes. The work in this paper is based upon a study of over seven million route traces produced by CAIDA's skitter infrastructure.
This paper introduces relevant statistics for the description of routes in the internet, seen as a graph at the interface level. Based on the observed properties, we propose and evaluate methods for generating artificial routes suitable for simulation purposes. The work in this paper is based upon a study of over seven million route traces produced by CAIDA's skitter infrastructure.
△ Less
Submitted 15 November, 2004;
originally announced November 2004.