Search | arXiv e-print repository

doi 10.1145/3629146

Packed to the Brim: Investigating the Impact of Highly Responsive Prefixes on Internet-wide Measurement Campaigns

Authors: Patrick Sattler, Johannes Zirngibl, Mattijs Jonker, Oliver Gasser, Georg Carle, Ralph Holz

Abstract: Internet-wide scans are an important tool to evaluate the deployment of services. To enable large-scale application layer scans, a fast, stateless port scan (e.g., using ZMap) is often performed ahead of time to collect responsive targets. It is a common expectation that port scans on the entire IPv4 address space provide a relatively unbiased view as they cover the complete address space. Previou… ▽ More Internet-wide scans are an important tool to evaluate the deployment of services. To enable large-scale application layer scans, a fast, stateless port scan (e.g., using ZMap) is often performed ahead of time to collect responsive targets. It is a common expectation that port scans on the entire IPv4 address space provide a relatively unbiased view as they cover the complete address space. Previous work, however, has found prefixes where all addresses share particular properties. In IPv6, aliased prefixes and fully responsive prefixes, i.e., prefixes where all addresses are responsive, are a well-known phenomenon. However, there is no such in-depth analysis for prefixes with these responsiveness patterns in IPv4. This paper delves into the underlying factors of this phenomenon in the context of IPv4 and evaluates port scans on a total of 161 ports (142 TCP & 19 UDP ports) from three different vantage points. To account for packet loss and other scanning artifacts, we propose the notion of a new category of prefixes, which we call highly responsive prefixes (HRPs). Our findings show that the share of HRPs can make up 70 % of responsive addresses on selected ports. Regarding specific ports, we observe that CDNs contribute to the largest fraction of HRPs on TCP/80 and TCP/443, while TCP proxies emerge as the primary cause of HRPs on other ports. Our analysis also reveals that application layer handshakes to targets outside HRPs are, depending on the chosen service, up to three times more likely to be successful compared to handshakes with targets located in HRPs. To improve future scanning campaigns conducted by the research community, we make our study's data publicly available and provide a tool for detecting HRPs. Furthermore, we propose an approach for a more efficient, ethical, and sustainable application layer target selection. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.11799 [pdf, ps, other]

Testing for patterns and structures in covariance and correlation matrices

Authors: Paavo Sattler, Dennis Dobler

Abstract: Covariance matrices of random vectors contain information that is crucial for modelling. Certain structures and patterns of the covariances (or correlations) may be used to justify parametric models, e.g., autoregressive models. Until now, there have been only few approaches for testing such covariance structures systematically and in a unified way. In the present paper, we propose such a unified… ▽ More Covariance matrices of random vectors contain information that is crucial for modelling. Certain structures and patterns of the covariances (or correlations) may be used to justify parametric models, e.g., autoregressive models. Until now, there have been only few approaches for testing such covariance structures systematically and in a unified way. In the present paper, we propose such a unified testing procedure, and we will exemplify the approach with a large variety of covariance structure models. This includes common structures such as diagonal matrices, Toeplitz matrices, and compound symmetry but also the more involved autoregressive matrices. We propose hypothesis tests for these structures, and we use bootstrap techniques for better small-sample approximation. The structures of the proposed tests invite for adaptations to other covariance patterns by choosing the hypothesis matrix appropriately. We prove their correctness for large sample sizes. The proposed methods require only weak assumptions. With the help of a simulation study, we assess the small sample properties of the tests. We also analyze a real data set to illustrate the application of the procedure. △ Less

Submitted 12 January, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.05562 [pdf, ps, other]

Choice of the hypothesis matrix for using the Wald-type-statistic

Authors: Paavo Sattler, Georg Zimmermann

Abstract: A widely used formulation for null hypotheses in the analysis of multivariate $d$-dimensional data is $\mathcal{H}_0: \boldsymbol{H} \boldsymbolθ =\boldsymbol{y}$ with $\boldsymbol{H}$ $\in\mathbb{R}^{m\times d}$, $\boldsymbolθ$ $\in \mathbb{R}^d$ and $\boldsymbol{y}\in\mathbb{R}^m$, where $m\leq d$. Here the unknown parameter vector $\boldsymbolθ$ can, for example, be the expectation vector… ▽ More A widely used formulation for null hypotheses in the analysis of multivariate $d$-dimensional data is $\mathcal{H}_0: \boldsymbol{H} \boldsymbolθ =\boldsymbol{y}$ with $\boldsymbol{H}$ $\in\mathbb{R}^{m\times d}$, $\boldsymbolθ$ $\in \mathbb{R}^d$ and $\boldsymbol{y}\in\mathbb{R}^m$, where $m\leq d$. Here the unknown parameter vector $\boldsymbolθ$ can, for example, be the expectation vector $\boldsymbolμ$, a vector $\boldsymbolβ $ containing regression coefficients or a quantile vector $\boldsymbol{q}$. Also, the vector of nonparametric relative effects $\boldsymbol{p}$ or an upper triangular vectorized covariance matrix $\textbf{v}$ are useful choices. However, even without multiplying the hypothesis with a scalar $γ\neq 0$, there is a multitude of possibilities to formulate the same null hypothesis with different hypothesis matrices $\boldsymbol{H}$ and corresponding vectors $\boldsymbol{y}$. Although it is a well-known fact that in case of $\boldsymbol{y}=\boldsymbol{0}$ there exists a unique projection matrix $\boldsymbol{P}$ with $\boldsymbol{H}\boldsymbolθ=\boldsymbol{0}\Leftrightarrow \boldsymbol{P}\boldsymbolθ=\boldsymbol{0}$, for $\boldsymbol{y}\neq \boldsymbol{0}$ such a projection matrix does not necessarily exist. Moreover, since such hypotheses are often investigated using a quadratic form as the test statistic, the corresponding projection matrices often contain zero rows; so, they are not even effective from a computational aspect. In this manuscript, we show that for the Wald-type-statistic (WTS), which is one of the most frequently used quadratic forms, the choice of the concrete hypothesis matrix does not affect the test decision. Moreover, some simulations are conducted to investigate the possible influence of the hypothesis matrix on the computation time. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.10516 [pdf, other]

doi 10.1145/3606464.3606474

Evaluating the Benefits: Quantifying the Effects of TCP Options, QUIC, and CDNs on Throughput

Authors: Simon Bauer, Patrick Sattler, Johannes Zirngibl, Christoph Schwarzenberg, Georg Carle

Abstract: To keep up with increasing demands on quality of experience, assessing and understanding the performance of network connections is crucial for web service providers. While different measures, like TCP options, alternative transport layer protocols like QUIC, or the hosting of services in CDNs, are expected to improve connection performance, no studies are quantifying such impacts on connections on… ▽ More To keep up with increasing demands on quality of experience, assessing and understanding the performance of network connections is crucial for web service providers. While different measures, like TCP options, alternative transport layer protocols like QUIC, or the hosting of services in CDNs, are expected to improve connection performance, no studies are quantifying such impacts on connections on the Internet. This paper introduces an active Internet measurement approach to assess the impacts of mentioned measures on connection performance. We conduct downloads from public web servers considering different vantage points, extract performance indicators like throughput, RTT, and retransmission rate, and survey speed-ups due to TCP option usage. Further, we compare the performance of QUIC-based downloads to TCP-based downloads considering different option configurations. Next to significant throughput improvements due to TCP option usage, in particular TCP window scaling, and QUIC, our study shows significantly increased performance for connections to domains hosted by different giant CDNs. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Presented at the ACM/IRTF Applied Networking Research Workshop 2023 (ANRW23)

arXiv:2309.10344 [pdf, other]

doi 10.1109/EuroSPW59978.2023.00058

A First Look at SVCB and HTTPS DNS Resource Records in the Wild

Authors: Johannes Zirngibl, Patrick Sattler, Georg Carle

Abstract: The Internet Engineering Task Force is standardizing new DNS resource records, namely SVCB and HTTPS. Both records inform clients about endpoint and service properties such as supported application layer protocols, IP address hints or Encrypted Client Hello (ECH) information. Therefore, they allow clients to reduce required DNS queries and potential retries during connection establishment and thus… ▽ More The Internet Engineering Task Force is standardizing new DNS resource records, namely SVCB and HTTPS. Both records inform clients about endpoint and service properties such as supported application layer protocols, IP address hints or Encrypted Client Hello (ECH) information. Therefore, they allow clients to reduce required DNS queries and potential retries during connection establishment and thus help to improve the quality of experience and privacy of the client. The latter is achieved by reducing visible meta-data, which is further improved with encrypted DNS and ECH. The standardization is in its final stages and companies announced support, e.g., Cloudflare and Apple. Therefore, we provide the first large-scale overview of actual record deployment by analyzing more than 400 M domains. We find 3.96 k SVCB and 10.5 M HTTPS records. As of March 2023, Cloudflare hosts and serves most domains, and most records only contain Application-Layer Protocol Negotiation (ALPN) and IP address hints. Besides Cloudflare, we see adoption by a variety of authoritative name servers and hosting providers indicating increased adoption in the near future. Lastly, we can verify the correctness of records for more than 93 % of domains based on three application layer scans. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Presented at the 8th International Workshop on Traffic Measurements for Cybersecurity (WTMC 2023)

arXiv:2308.15841 [pdf, other]

doi 10.1007/978-3-031-56252-5_13

QUIC Hunter: Finding QUIC Deployments and Identifying Server Libraries Across the Internet

Authors: Johannes Zirngibl, Florian Gebauer, Patrick Sattler, Markus Sosnowski, Georg Carle

Abstract: The diversity of QUIC implementations poses challenges for Internet measurements and the analysis of the QUIC ecosystem. While all implementations follow the same specification and there is general interoperability, differences in performance, functionality, but also security (e.g., due to bugs) can be expected. Therefore, knowledge about the implementation of an endpoint on the Internet can help… ▽ More The diversity of QUIC implementations poses challenges for Internet measurements and the analysis of the QUIC ecosystem. While all implementations follow the same specification and there is general interoperability, differences in performance, functionality, but also security (e.g., due to bugs) can be expected. Therefore, knowledge about the implementation of an endpoint on the Internet can help researchers, operators, and users to better analyze connections, performance, and security. In this work, we improved the detection rate of QUIC scans to find more deployments and provide an approach to effectively identify QUIC server libraries based on CONNECTION CLOSE frames and transport parameter orders. We performed Internet-wide scans and identified at least one deployment for 18 QUIC libraries. In total, we can identify the libraries with 8.0 M IPv4 and 2.5 M IPv6 addresses. We provide a comprehensive view of the landscape of competing QUIC libraries. △ Less

Submitted 19 March, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: preprint

Journal ref: Proceedings of the Passive and Active Measurement Conference 2024 (PAM '24)

arXiv:2302.11393 [pdf, other]

How Ready Is DNS for an IPv6-Only World?

Authors: Florian Streibelt, Patrick Sattler, Franziska Lichtblau, Carlos H. Gañán, Anja Feldmann, Oliver Gasser, Tobias Fiebig

Abstract: DNS is one of the core building blocks of the Internet. In this paper, we investigate DNS resolution in a strict IPv6-only scenario and find that a substantial fraction of zones cannot be resolved. We point out, that the presence of an AAAA resource record for a zone's nameserver does not necessarily imply that it is resolvable in an IPv6-only environment since the full DNS delegation chain must r… ▽ More DNS is one of the core building blocks of the Internet. In this paper, we investigate DNS resolution in a strict IPv6-only scenario and find that a substantial fraction of zones cannot be resolved. We point out, that the presence of an AAAA resource record for a zone's nameserver does not necessarily imply that it is resolvable in an IPv6-only environment since the full DNS delegation chain must resolve via IPv6 as well. Hence, in an IPv6-only setting zones may experience an effect similar to what is commonly referred to as lame delegation. Our longitudinal study shows that the continuing centralization of the Internet has a large impact on IPv6 readiness, i.e., a small number of large DNS providers has, and still can, influence IPv6 readiness for a large number of zones. A single operator that enabled IPv6 DNS resolution -- by adding IPv6 glue records -- was responsible for around 20.3% of all zones in our dataset not resolving over IPv6 until January 2017. Even today, 10% of DNS operators are responsible for more than 97.5% of all zones that do not resolve using IPv6. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Journal ref: Proceedings of the Passive and Active Measurement Conference 2023 (PAM '23)

arXiv:2210.03024 [pdf, other]

doi 10.1088/1367-2630/acef4c

Homogeneous Electron Liquid in Arbitrary Dimensions: Exchange and Correlation Using the Singwi-Tosi-Land-Sjölander Approach

Authors: L. V. Duc Pham, Pascal Sattler, Miguel A. L. Marques, Carlos L. Benavides-Riveros

Abstract: The ground states of the homogeneous electron gas and the homogeneous electron liquid are cornerstones in quantum physics and chemistry. They are archetypal systems in the regime of slowly varying densities in which the exchange-correlation energy can be estimated with a myriad of methods. For high densities, the behavior of the energy is well-known for 1, 2, and 3 dimensions. Here, we extend this… ▽ More The ground states of the homogeneous electron gas and the homogeneous electron liquid are cornerstones in quantum physics and chemistry. They are archetypal systems in the regime of slowly varying densities in which the exchange-correlation energy can be estimated with a myriad of methods. For high densities, the behavior of the energy is well-known for 1, 2, and 3 dimensions. Here, we extend this model to arbitrary integer dimensions, and compute its correlation energy beyond the random phase approximation (RPA), using the celebrated approach developed by Singwi, Tosi, Land, and Sjölander (STLS), which is known to be remarkably accurate in the description of the full electronic density response for $2D$ and $3D$, both in the paramagnetic and ferromagnetic ground states. For higher dimensions, we compare the results obtained for the correlation energy using the STLS method with the values previously obtained using RPA. We found that at high dimensions STLS tends to be more physical in the sense that the infamous sum rules are better satisfied by the theory. We furthermore illustrate the importance of the plasmon contribution to STLS theory. △ Less

Submitted 1 March, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: 14 pages, 9 figures

Journal ref: New J. Phys. 25, 083040 (2023)

arXiv:2209.09317 [pdf, other]

doi 10.1145/3517745.3561440

Rusty Clusters? Dusting an IPv6 Research Foundation

Authors: Johannes Zirngibl, Lion Steger, Patrick Sattler, Oliver Gasser, Georg Carle

Abstract: The long-running IPv6 Hitlist service is an important foundation for IPv6 measurement studies. It helps to overcome infeasible, complete address space scans by collecting valuable, unbiased IPv6 address candidates and regularly testing their responsiveness. However, the Internet itself is a quickly changing ecosystem that can affect longrunning services, potentially inducing biases and obscurities… ▽ More The long-running IPv6 Hitlist service is an important foundation for IPv6 measurement studies. It helps to overcome infeasible, complete address space scans by collecting valuable, unbiased IPv6 address candidates and regularly testing their responsiveness. However, the Internet itself is a quickly changing ecosystem that can affect longrunning services, potentially inducing biases and obscurities into ongoing data collection means. Frequent analyses but also updates are necessary to enable a valuable service to the community. In this paper, we show that the existing hitlist is highly impacted by the Great Firewall of China, and we offer a cleaned view on the development of responsive addresses. While the accumulated input shows an increasing bias towards some networks, the cleaned set of responsive addresses is well distributed and shows a steady increase. Although it is a best practice to remove aliased prefixes from IPv6 hitlists, we show that this also removes major content delivery networks. More than 98% of all IPv6 addresses announced by Fastly were labeled as aliased and Cloudflare prefixes hosting more than 10M domains were excluded. Depending on the hitlist usage, e.g., higher layer protocol scans, inclusion of addresses from these providers can be valuable. Lastly, we evaluate different new address candidate sources, including target generation algorithms to improve the coverage of the current IPv6 Hitlist. We show that a combination of different methodologies is able to identify 5.6M new, responsive addresses. This accounts for an increase by 174% and combined with the current IPv6 Hitlist, we identify 8.8M responsive addresses. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2209.04380 [pdf, other]

Testing Hypotheses about Correlation Matrices in General MANOVA Designs

Authors: Paavo Sattler, Markus Pauly

Abstract: Correlation matrices are an essential tool for investigating the dependency structures of random vectors or comparing them. We introduce an approach for testing a variety of null hypotheses that can be formulated based upon the correlation matrix. Examples cover MANOVA-type hypothesis of equal correlation matrices as well as testing for special correlation structures such as, e.g., sphericity. Apa… ▽ More Correlation matrices are an essential tool for investigating the dependency structures of random vectors or comparing them. We introduce an approach for testing a variety of null hypotheses that can be formulated based upon the correlation matrix. Examples cover MANOVA-type hypothesis of equal correlation matrices as well as testing for special correlation structures such as, e.g., sphericity. Apart from existing fourth moments, our approach requires no other assumptions, allowing applications in various settings. To improve the small sample performance, a bootstrap technique is proposed and theoretically justified. Based on this, we also present a procedure to simultaneously test the hypotheses of equal correlation and equal covariance matrices. The performance of all new test statistics is compared with existing procedures through extensive simulations. △ Less

Submitted 11 July, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

arXiv:2209.00965 [pdf, other]

Waiting for QUIC: On the Opportunities of Passive Measurements to Understand QUIC Deployments

Authors: Jonas Mücke, Marcin Nawrocki, Raphael Hiesgen, Patrick Sattler, Johannes Zirngibl, Georg Carle, Thomas C. Schmidt, Matthias Wählisch

Abstract: In this paper, we study the potentials of passive measurements to gain advanced knowledge about QUIC deployments. By analyzing one month backscatter traffic of the /9 CAIDA network telescope, we are able to make the following observations. First, we can identify different off-net deployments of hypergiants, using packet features such as QUIC source connection IDs (SCID), packet coalescence, and pa… ▽ More In this paper, we study the potentials of passive measurements to gain advanced knowledge about QUIC deployments. By analyzing one month backscatter traffic of the /9 CAIDA network telescope, we are able to make the following observations. First, we can identify different off-net deployments of hypergiants, using packet features such as QUIC source connection IDs (SCID), packet coalescence, and packet lengths. Second, Facebook and Google configure significantly different retransmission timeouts and maximum number of retransmissions. Third, SCIDs allow further insights into load balancer deployments such as number of servers per load balancer. We bolster our results by active measurements. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: preprint

arXiv:2207.09382 [pdf, other]

Inference for high-dimensional split-plot designs with different dimensions between groups

Authors: Paavo Sattler, Markus Pauly

Abstract: In repeated Measure Designs with multiple groups, the primary purpose is to compare different groups in various aspects. For several reasons, the number of measurements and therefore the dimension of the observation vectors can depend on the group, making the usage of existing approaches impossible. We develop an approach which can be used not only for a possibly increasing number of groups $a$, b… ▽ More In repeated Measure Designs with multiple groups, the primary purpose is to compare different groups in various aspects. For several reasons, the number of measurements and therefore the dimension of the observation vectors can depend on the group, making the usage of existing approaches impossible. We develop an approach which can be used not only for a possibly increasing number of groups $a$, but also for group-depending dimension $d_i$, which is allowed to go to infinity. This is a unique high-dimensional asymptotic framework impressing through its variety and do without usual conditions on the relation between sample size and dimension. It especially includes settings with fixed dimensions in some groups and increasing dimensions in other ones, which can be seen as semi-high-dimensional. To find a appropriate statistic test new and innovative estimators are developed, which can be used under these diverse settings on $a,d_i$ and $n_i$ without any adjustments. We investigated the asymptotic distribution of a quadratic-form-based test statistic and developed an asymptotic correct test. Finally, an extensive simulation study is conducted to investigate the role of the single group's dimension. △ Less

Submitted 19 July, 2022; originally announced July 2022.

arXiv:2207.02112 [pdf, other]

doi 10.1145/3517745.3561426

Towards a Tectonic Traffic Shift? Investigating Apple's New Relay Network

Authors: Patrick Sattler, Juliane Aulbach, Johannes Zirngibl, Georg Carle

Abstract: Apple recently published its first Beta of the iCloud Private Relay, a privacy protection service with promises resembling the ones of VPNs. The architecture consists of two layers (ingress and egress), operated by disjoint providers. The service is directly integrated into Apple's operating systems and therefore provides a low entry level barrier for a large user base. It seems to be set up for m… ▽ More Apple recently published its first Beta of the iCloud Private Relay, a privacy protection service with promises resembling the ones of VPNs. The architecture consists of two layers (ingress and egress), operated by disjoint providers. The service is directly integrated into Apple's operating systems and therefore provides a low entry level barrier for a large user base. It seems to be set up for major adoption with its relatively moderate entry-level price. This paper analyzes the iCloud Private Relay from a network perspective and its effect on the Internet and future measurement-based research. We perform EDNS0 Client Subnet DNS queries to collect ingress relay addresses and find 1586 IPv4 addresses. Supplementary RIPE Atlas DNS measurements reveal 1575 IPv6 addresses. Knowledge about these addresses helps to passively detect clients communicating through the relay network. According to our scans, from January through April, ingress addresses grew by 20%. The analysis of our scans through the relay network verifies Apple's claim of rotating egress addresses. Nevertheless, it reveals that ingress and egress relays can be located in the same autonomous system, thus sharing similar routes, potentially allowing traffic correlation. △ Less

Submitted 26 September, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

arXiv:2206.13230 [pdf, other]

Active TLS Stack Fingerprinting: Characterizing TLS Server Deployments at Scale

Authors: Markus Sosnowski, Johannes Zirngibl, Patrick Sattler, Georg Carle, Claas Grohnfeldt, Michele Russo, Daniele Sgandurra

Abstract: Active measurements can be used to collect server characteristics on a large scale. This kind of metadata can help discovering hidden relations and commonalities among server deployments offering new possibilities to cluster and classify them. As an example, identifying a previously-unknown cybercriminal infrastructures can be a valuable source for cyber-threat intelligence. We propose herein an a… ▽ More Active measurements can be used to collect server characteristics on a large scale. This kind of metadata can help discovering hidden relations and commonalities among server deployments offering new possibilities to cluster and classify them. As an example, identifying a previously-unknown cybercriminal infrastructures can be a valuable source for cyber-threat intelligence. We propose herein an active measurement-based methodology for acquiring Transport Layer Security (TLS) metadata from servers and leverage it for their fingerprinting. Our fingerprints capture the characteristic behavior of the TLS stack primarily caused by the implementation, configuration, and hardware support of the underlying server. Using an empirical optimization strategy that maximizes information gain from every handshake to minimize measurement costs, we generated 10 general-purpose Client Hellos used as scanning probes to create a large database of TLS configurations used for classifying servers. We fingerprinted 28 million servers from the Alexa and Majestic toplists and two Command and Control (C2) blocklists over a period of 30 weeks with weekly snapshots as foundation for two long-term case studies: classification of Content Delivery Network and C2 servers. The proposed methodology shows a precision of more than 99 % and enables a stable identification of new servers over time. This study describes a new opportunity for active measurements to provide valuable insights into the Internet that can be used in security-relevant use cases. △ Less

Submitted 30 August, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Original: https://dl.ifip.org/db/conf/tma/tma2022/tma2022-paper35.pdf Additional Material: https://active-tls-fingerprinting.github.io/

Journal ref: Proc. Network Traffic Measurement and Analysis Conference (TMA) 2022

arXiv:1911.01979 [pdf, other]

Manifold Asymptotics of Quadratic-Form-Based Inference in Repeated Measures Designs

Authors: Paavo Sattler

Abstract: Split-Plot or Repeated Measures Designs with multiple groups occur naturally in sciences. Their analysis is usually based on the classical Repeated Measures ANOVA. Roughly speaking, the latter can be shown to be asymptotically valid for large sample sizes $n_i$ assuming a fixed number of groups $a$ and time points $d$. However, for high-dimensional settings with $d>n_i$ this argument breaks down a… ▽ More Split-Plot or Repeated Measures Designs with multiple groups occur naturally in sciences. Their analysis is usually based on the classical Repeated Measures ANOVA. Roughly speaking, the latter can be shown to be asymptotically valid for large sample sizes $n_i$ assuming a fixed number of groups $a$ and time points $d$. However, for high-dimensional settings with $d>n_i$ this argument breaks down and statistical tests are often based on (standardized) quadratic forms. Furthermore analysis of their limit behaviour is usually based on certain assumptions on how $d$ converges to $\infty$ with respect to $n_i$. As this may be hard to argue in practice, we do not want to make such restrictions. Moreover, sometimes also the number of groups $a$ may be large compared to $d$ or $n_i$. To also have an impression about the behaviour of (standardized) quadratic forms as test statistic, we analyze their asymptotics under diverse settings on $a$, $d$ and $n_i$. In fact, we combine all kind of combinations, where they diverge or are bounded in a unified framework. Studying the limit distributions in detail, we follow Sattler and Pauly (2018) and propose an approximation to obtain critical values. The resulting test together with their approximation approach are investigated in an extensive simulation study with a focus on the exceptional asymptotic frameworks which are the main focus of this work. △ Less

Submitted 9 March, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

arXiv:1909.06205 [pdf, other]

Testing Hypotheses about Covariance Matrices in General MANOVA Designs

Authors: Paavo Sattler, Arne C. Bathke, Markus Pauly

Abstract: We introduce a unified approach to testing a variety of rather general null hypotheses that can be formulated in terms of covariances matrices. These include as special cases, for example, testing for equal variances, equal traces, or for elements of the covariance matrix taking certain values. The proposed method only requires very few assumptions and thus promises to be of broad practical use. T… ▽ More We introduce a unified approach to testing a variety of rather general null hypotheses that can be formulated in terms of covariances matrices. These include as special cases, for example, testing for equal variances, equal traces, or for elements of the covariance matrix taking certain values. The proposed method only requires very few assumptions and thus promises to be of broad practical use. Two test statistics are defined, and their asymptotic or approximate sampling distributions are derived. In order to improve particularly the small-sample behavior of the resulting tests, two bootstrap-based methods are developed and theoretically justified. Several simulations shed light on the performance of the proposed tests. The analysis of a real data set illustrates the application of the procedures. △ Less

Submitted 22 December, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

Comments: Submitted to Statistica Sinica

arXiv:1706.09331 [pdf, other]

HLOC: Hints-Based Geolocation Leveraging Multiple Measurement Frameworks

Authors: Quirin Scheitle, Oliver Gasser, Patrick Sattler, Georg Carle

Abstract: Geographically locating an IP address is of interest for many purposes. There are two major ways to obtain the location of an IP address: querying commercial databases or conducting latency measurements. For structural Internet nodes, such as routers, commercial databases are limited by low accuracy, while current measurement-based approaches overwhelm users with setup overhead and scalability iss… ▽ More Geographically locating an IP address is of interest for many purposes. There are two major ways to obtain the location of an IP address: querying commercial databases or conducting latency measurements. For structural Internet nodes, such as routers, commercial databases are limited by low accuracy, while current measurement-based approaches overwhelm users with setup overhead and scalability issues. In this work we present our system HLOC, aiming to combine the ease of database use with the accuracy of latency measurements. We evaluate HLOC on a comprehensive router data set of 1.4M IPv4 and 183k IPv6 routers. HLOC first extracts location hints from rDNS names, and then conducts multi-tier latency measurements. Configuration complexity is minimized by using publicly available large-scale measurement frameworks such as RIPE Atlas. Using this measurement, we can confirm or disprove the location hints found in domain names. We publicly release HLOC's ready-to-use source code, enabling researchers to easily increase geolocation accuracy with minimum overhead. △ Less

Submitted 28 June, 2017; originally announced June 2017.

Comments: As published in TMA'17 conference: http://tma.ifip.org/main-conference/

arXiv:1706.02592 [pdf, other]

Inference For High-Dimensional Split-Plot-Designs: A Unified Approach for Small to Large Numbers of Factor Levels

Authors: Paavo Sattler, Markus Pauly

Abstract: Statisticians increasingly face the problem to reconsider the adaptability of classical inference techniques. In particular, divers types of high-dimensional data structures are observed in various research areas; disclosing the boundaries of conventional multivariate data analysis. Such situations occur, e.g., frequently in life sciences whenever it is easier or cheaper to repeatedly generate a l… ▽ More Statisticians increasingly face the problem to reconsider the adaptability of classical inference techniques. In particular, divers types of high-dimensional data structures are observed in various research areas; disclosing the boundaries of conventional multivariate data analysis. Such situations occur, e.g., frequently in life sciences whenever it is easier or cheaper to repeatedly generate a large number $d$ of observations per subject than recruiting many, say $N$, subjects. In this paper we discuss inference procedures for such situations in general heteroscedastic split-plot designs with $a$ independent groups of repeated measurements. These will, e.g., be able to answer questions about the occurrence of certain time, group and interactions effects or about particular profiles. The test procedures are based on standardized quadratic forms involving suitably symmetrized U-statistics-type estimators which are robust against an increasing number of dimensions $d$ and/or groups $a$. We then discuss its limit distributions in a general asymptotic framework and additionally propose improved small sample approximations. Finally its small sample performance is investigated in simulations and the applicability is illustrated by a real data analysis. △ Less

Submitted 8 June, 2017; originally announced June 2017.

MSC Class: 62H15

Showing 1–18 of 18 results for author: Sattler, P