-
Detecting Edited Knowledge in Language Models
Authors:
Paul Youssef,
Zhixue Zhao,
Jörg Schlötterer,
Christin Seifert
Abstract:
Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models and provide more transpa…
▽ More
Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models and provide more transparency. Driven by this, we propose a novel task: detecting edited knowledge in language models. Given an edited model and a fact retrieved by a prompt from an edited model, the objective is to classify the knowledge as either unedited (based on the pre-training), or edited (based on subsequent editing). We instantiate the task with four KEs, two LLMs, and two datasets. Additionally, we propose using the hidden state representations and the probability distributions as features for the detection. Our results reveal that, using these features as inputs to a simple AdaBoost classifiers establishes a strong baseline. This classifier requires only a limited amount of data and maintains its performance even in cross-domain settings. Last, we find it more challenging to distinguish edited knowledge from unedited but related knowledge, highlighting the need for further research. Our work lays the groundwork for addressing malicious model editing, which is a critical challenge associated with the strong generative capabilities of LLMs.
△ Less
Submitted 1 July, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study
Authors:
Van Bach Nguyen,
Paul Youssef,
Jörg Schlötterer,
Christin Seifert
Abstract:
As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how…
▽ More
As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for two NLU tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than NLI where LLMs show weaknesses in generating CFs that flip the original label. This also reflects on the data augmentation performance, where we observe a large gap between augmenting with human and LLMs CFs. Furthermore, we evaluate LLMs' ability to assess CFs in a mislabelled data setting, and show that they have a strong bias towards agreeing with the provided labels. GPT4 is more robust against this bias and its scores correlate well with automatic metrics. Our findings reveal several limitations and point to potential future work directions.
△ Less
Submitted 26 April, 2024;
originally announced May 2024.
-
Central Limit Theorem for tensor products of free variables
Authors:
Cécilia Lancien,
Patrick Oliveira Santos,
Pierre Youssef
Abstract:
We establish a central limit theorem for tensor product random variables $c_k:=a_k \otimes a_k$, where $(a_k)_{k \in \mathbb{N}}$ is a free family of variables. We show that if the variables $a_k$ are centered, the limiting law is the semi-circle. Otherwise, the limiting law depends on the mean and variance of the variables $a_k$ and corresponds to a free interpolation between the semi-circle law…
▽ More
We establish a central limit theorem for tensor product random variables $c_k:=a_k \otimes a_k$, where $(a_k)_{k \in \mathbb{N}}$ is a free family of variables. We show that if the variables $a_k$ are centered, the limiting law is the semi-circle. Otherwise, the limiting law depends on the mean and variance of the variables $a_k$ and corresponds to a free interpolation between the semi-circle law and the classical convolution of two semi-circle laws.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse
Authors:
Mohamed El Amine Seddik,
Suei-Wen Chen,
Soufiane Hayou,
Pierre Youssef,
Merouane Debbah
Abstract:
The phenomenon of model collapse, introduced in (Shumailov et al., 2023), refers to the deterioration in performance that occurs when new models are trained on synthetic data generated from previously trained models. This recursive training loop makes the tails of the original distribution disappear, thereby making future-generation models forget about the initial (real) distribution. With the aim…
▽ More
The phenomenon of model collapse, introduced in (Shumailov et al., 2023), refers to the deterioration in performance that occurs when new models are trained on synthetic data generated from previously trained models. This recursive training loop makes the tails of the original distribution disappear, thereby making future-generation models forget about the initial (real) distribution. With the aim of rigorously understanding model collapse in language models, we consider in this paper a statistical model that allows us to characterize the impact of various recursive training scenarios. Specifically, we demonstrate that model collapse cannot be avoided when training solely on synthetic data. However, when mixing both real and synthetic data, we provide an estimate of a maximal amount of synthetic data below which model collapse can eventually be avoided. Our theoretical conclusions are further supported by empirical validations.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
The Queen of England is not England's Queen: On the Lack of Factual Coherency in PLMs
Authors:
Paul Youssef,
Jörg Schlötterer,
Christin Seifert
Abstract:
Factual knowledge encoded in Pre-trained Language Models (PLMs) enriches their representations and justifies their use as knowledge bases. Previous work has focused on probing PLMs for factual knowledge by measuring how often they can correctly predict an object entity given a subject and a relation, and improving fact retrieval by optimizing the prompts used for querying PLMs. In this work, we co…
▽ More
Factual knowledge encoded in Pre-trained Language Models (PLMs) enriches their representations and justifies their use as knowledge bases. Previous work has focused on probing PLMs for factual knowledge by measuring how often they can correctly predict an object entity given a subject and a relation, and improving fact retrieval by optimizing the prompts used for querying PLMs. In this work, we consider a complementary aspect, namely the coherency of factual knowledge in PLMs, i.e., how often can PLMs predict the subject entity given its initial prediction of the object entity. This goes beyond evaluating how much PLMs know, and focuses on the internal state of knowledge inside them. Our results indicate that PLMs have low coherency using manually written, optimized and paraphrased prompts, but including an evidence paragraph leads to substantial improvement. This shows that PLMs fail to model inverse relations and need further enhancements to be able to handle retrieving facts from their parameters in a coherent manner, and to be considered as knowledge bases.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
On spectral outliers of inhomogeneous symmetric random matrices
Authors:
Dylan J. Altschuler,
Patrick Oliveira Santos,
Konstantin Tikhomirov,
Pierre Youssef
Abstract:
Sharp conditions for the presence of spectral outliers are well understood for Wigner random matrices with iid entries. In the setting of inhomogeneous symmetric random matrices (i.e., matrices with a non-trivial variance profile), the corresponding problem has been considered only recently. Of special interest is the setting of sparse inhomogeneous matrices since sparsity is both a key feature an…
▽ More
Sharp conditions for the presence of spectral outliers are well understood for Wigner random matrices with iid entries. In the setting of inhomogeneous symmetric random matrices (i.e., matrices with a non-trivial variance profile), the corresponding problem has been considered only recently. Of special interest is the setting of sparse inhomogeneous matrices since sparsity is both a key feature and a technical obstacle in various aspects of random matrix theory. For such matrices, the largest of the variances of the entries has been used in the literature as a natural proxy for sparsity. We contribute sharp conditions in terms of this parameter for an inhomogeneous symmetric matrix with sub-Gaussian entries to have outliers. Our result implies a ``structural'' universality principle: the presence of outliers is only determined by the level of sparsity, rather than the detailed structure of the variance profile.
△ Less
Submitted 25 January, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Limiting spectral distribution of random self-adjoint quantum channels
Authors:
Cécilia Lancien,
Patrick Oliveira Santos,
Pierre Youssef
Abstract:
We study the limiting spectral distribution of quantum channels whose Kraus operators are sampled as $n\times n$ random Hermitian matrices satisfying certain assumptions. We show that when the Kraus rank goes to infinity with n, the limiting spectral distribution (suitably rescaled) of the corresponding quantum channel coincides with the semi-circle distribution. When the Kraus rank is fixed, the…
▽ More
We study the limiting spectral distribution of quantum channels whose Kraus operators are sampled as $n\times n$ random Hermitian matrices satisfying certain assumptions. We show that when the Kraus rank goes to infinity with n, the limiting spectral distribution (suitably rescaled) of the corresponding quantum channel coincides with the semi-circle distribution. When the Kraus rank is fixed, the limiting spectral distribution is no longer the semi-circle distribution. It corresponds to an explicit law, which can also be described using tools from free probability.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained Language Models
Authors:
Paul Youssef,
Osman Alperen Koraş,
Meijie Li,
Jörg Schlötterer,
Christin Seifert
Abstract:
Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge. This fact has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs, as this explains their performance on downstream tasks, and potentially justifies their use as knowledge bases. In this work, we survey methods and datasets that are used to probe PLMs for…
▽ More
Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge. This fact has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs, as this explains their performance on downstream tasks, and potentially justifies their use as knowledge bases. In this work, we survey methods and datasets that are used to probe PLMs for factual knowledge. Our contributions are: (1) We propose a categorization scheme for factual probing methods that is based on how their inputs, outputs and the probed PLMs are adapted; (2) We provide an overview of the datasets used for factual probing; (3) We synthesize insights about knowledge retention and prompt optimization in PLMs, analyze obstacles to adopting PLMs as knowledge bases and outline directions for future work.
△ Less
Submitted 4 December, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis
Authors:
Jan Trienes,
Paul Youssef,
Jörg Schlötterer,
Christin Seifert
Abstract:
Automatically summarizing radiology reports into a concise impression can reduce the manual burden of clinicians and improve the consistency of reporting. Previous work aimed to enhance content selection and factuality through guided abstractive summarization. However, two key issues persist. First, current methods heavily rely on domain-specific resources to extract the guidance signal, limiting…
▽ More
Automatically summarizing radiology reports into a concise impression can reduce the manual burden of clinicians and improve the consistency of reporting. Previous work aimed to enhance content selection and factuality through guided abstractive summarization. However, two key issues persist. First, current methods heavily rely on domain-specific resources to extract the guidance signal, limiting their transferability to domains and languages where those resources are unavailable. Second, while automatic metrics like ROUGE show progress, we lack a good understanding of the errors and failure modes in this task. To bridge these gaps, we first propose a domain-agnostic guidance signal in form of variable-length extractive summaries. Our empirical results on two English benchmarks demonstrate that this guidance signal improves upon unguided summarization while being competitive with domain-specific methods. Additionally, we run an expert evaluation of four systems according to a taxonomy of 11 fine-grained errors. We find that the most pressing differences between automatic summaries and those of radiologists relate to content selection including omissions (up to 52%) and additions (up to 57%). We hypothesize that latent reporting factors and corpus-level inconsistencies may limit models to reliably learn content selection from the available data, presenting promising directions for future work.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Flat bands of periodic graphs
Authors:
Mostafa Sabri,
Pierre Youssef
Abstract:
We study flat bands of periodic graphs in a Euclidean space. These are infinitely degenerate eigenvalues of the corresponding adjacency matrix, with eigenvectors of compact support. We provide some optimal recipes to generate desired bands, some sufficient conditions for a graph to have flat bands, we characterize the set of flat bands whose eigenvectors occupy a single cell and we compute the lis…
▽ More
We study flat bands of periodic graphs in a Euclidean space. These are infinitely degenerate eigenvalues of the corresponding adjacency matrix, with eigenvectors of compact support. We provide some optimal recipes to generate desired bands, some sufficient conditions for a graph to have flat bands, we characterize the set of flat bands whose eigenvectors occupy a single cell and we compute the list of such bands for small cells. We next discuss stability and rarity of flat bands in special cases. Additional folklore results are proved and many questions are still open.
△ Less
Submitted 27 April, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
A note on quantum expanders
Authors:
Cécilia Lancien,
Pierre Youssef
Abstract:
We prove that a wide class of random quantum channels with few Kraus operators, sampled as random matrices with some moment assumptions, exhibit a large spectral gap, and are therefore optimal quantum expanders. In particular, our result provides a recipe to construct random quantum expanders from their classical (random or deterministic) counterparts. This considerably enlarges the list of known…
▽ More
We prove that a wide class of random quantum channels with few Kraus operators, sampled as random matrices with some moment assumptions, exhibit a large spectral gap, and are therefore optimal quantum expanders. In particular, our result provides a recipe to construct random quantum expanders from their classical (random or deterministic) counterparts. This considerably enlarges the list of known constructions of optimal quantum expanders, which was previously limited to few examples. Our proofs rely on recent progress in the study of the operator norm of random matrices with dependence and non-homogeneity, which we expect to have further applications in several areas of quantum information.
△ Less
Submitted 23 February, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Monotonicity of the logarithmic energy for random matrices
Authors:
Djalil Chafaï,
Benjamin Dadoun,
Pierre Youssef
Abstract:
It is well-known that the semi-circle law, which is the limiting distribution in the Wigner theorem, is the minimizer of the logarithmic energy penalized by the second moment. A very similar fact holds for the Girko and Marchenko--Pastur theorems. In this work, we shed the light on an intriguing phenomenon suggesting that this functional is monotonic along the mean empirical spectral distribution…
▽ More
It is well-known that the semi-circle law, which is the limiting distribution in the Wigner theorem, is the minimizer of the logarithmic energy penalized by the second moment. A very similar fact holds for the Girko and Marchenko--Pastur theorems. In this work, we shed the light on an intriguing phenomenon suggesting that this functional is monotonic along the mean empirical spectral distribution in terms of the matrix dimension. This is reminiscent of the monotonicity of the Boltzmann entropy along the Boltzmann equation, the monotonicity of the free energy along ergodic Markov processes, and the Shannon monotonicity of entropy or free entropy along the classical or free central limit theorem. While we only verify this monotonicity phenomenon for the Gaussian unitary ensemble, the complex Ginibre ensemble, and the square Laguerre unitary ensemble, numerical simulations suggest that it is actually more universal. We obtain along the way explicit formulas of the logarithmic energy of the mentioned models which can be of independent interest.
△ Less
Submitted 8 April, 2024; v1 submitted 12 December, 2022;
originally announced December 2022.
-
Upgrading MLSI to LSI for reversible Markov chains
Authors:
Justin Salez,
Konstantin Tikhomirov,
Pierre Youssef
Abstract:
For reversible Markov chains on finite state spaces, we show that the modified log-Sobolev inequality (MLSI) can be upgraded to a log-Sobolev inequality (LSI) at the surprisingly low cost of degrading the associated constant by $\log (1/p)$, where $p$ is the minimum non-zero transition probability. We illustrate this by providing the first log-Sobolev estimate for Zero-Range processes on arbitrary…
▽ More
For reversible Markov chains on finite state spaces, we show that the modified log-Sobolev inequality (MLSI) can be upgraded to a log-Sobolev inequality (LSI) at the surprisingly low cost of degrading the associated constant by $\log (1/p)$, where $p$ is the minimum non-zero transition probability. We illustrate this by providing the first log-Sobolev estimate for Zero-Range processes on arbitrary graphs. As another application, we determine the modified log-Sobolev constant of the Lamplighter chain on all bounded-degree graphs, and use it to provide negative answers to two open questions by Montenegro and Tetali (2006) and Hermon and Peres (2018). Our proof builds upon the `regularization trick' recently introduced by the last two authors.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Online Decentralized Frank-Wolfe: From theoretical bound to applications in smart-building
Authors:
Angan Mitra,
Nguyen Kim Thang,
Tuan-Anh Nguyen,
Denis Trystram,
Paul Youssef
Abstract:
The design of decentralized learning algorithms is important in the fast-growing world in which data are distributed over participants with limited local computation resources and communication. In this direction, we propose an online algorithm minimizing non-convex loss functions aggregated from individual data/models distributed over a network. We provide the theoretical performance guarantee of…
▽ More
The design of decentralized learning algorithms is important in the fast-growing world in which data are distributed over participants with limited local computation resources and communication. In this direction, we propose an online algorithm minimizing non-convex loss functions aggregated from individual data/models distributed over a network. We provide the theoretical performance guarantee of our algorithm and demonstrate its utility on a real life smart building.
△ Less
Submitted 31 July, 2022;
originally announced August 2022.
-
Online 2-stage Stable Matching
Authors:
Evripidis Bampis,
Bruno Escoffier,
Paul Youssef
Abstract:
We focus on an online 2-stage problem, motivated by the following situation: consider a system where students shall be assigned to universities. There is a first round where some students apply, and a first (stable) matching $M_1$ has to be computed. However, some students may decide to leave the system (change their plan, go to a foreign university, or to some institution not in the system). Then…
▽ More
We focus on an online 2-stage problem, motivated by the following situation: consider a system where students shall be assigned to universities. There is a first round where some students apply, and a first (stable) matching $M_1$ has to be computed. However, some students may decide to leave the system (change their plan, go to a foreign university, or to some institution not in the system). Then, in a second round (after these deletions), we shall compute a second (final) stable matching $M_2$. As it is undesirable to change assignments, the goal is to minimize the number of divorces/modifications between the two stable matchings $M_1$ and $M_2$. Then, how should we choose $M_1$ and $M_2$? We show that there is an {\it optimal online} algorithm to solve this problem. In particular, thanks to a dominance property, we show that we can optimally compute $M_1$ without knowing the students that will leave the system. We generalize the result to some other possible modifications in the input (students, open positions).
We also tackle the case of more stages, showing that no competitive (online) algorithm can be achieved for the considered problem as soon as there are 3 stages.
△ Less
Submitted 2 May, 2023; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Regularized modified log-Sobolev inequalities, and comparison of Markov chains
Authors:
Konstantin Tikhomirov,
Pierre Youssef
Abstract:
In this work, we develop a comparison procedure for the Modified log-Sobolev Inequality (MLSI) constants of two reversible Markov chains on a finite state space. Efficient comparison of the MLSI Dirichlet forms is a well known obstacle in the theory of Markov chains. We approach this problem by introducing a {\it regularized} MLSI constant which, under some assumptions, has the same order of magni…
▽ More
In this work, we develop a comparison procedure for the Modified log-Sobolev Inequality (MLSI) constants of two reversible Markov chains on a finite state space. Efficient comparison of the MLSI Dirichlet forms is a well known obstacle in the theory of Markov chains. We approach this problem by introducing a {\it regularized} MLSI constant which, under some assumptions, has the same order of magnitude as the usual MLSI constant yet is amenable for comparison and thus considerably simpler to estimate in certain cases. As an application of this general comparison procedure, we provide a sharp estimate of the MLSI constant of the switch chain on the the set of simple bipartite regular graphs of size $n$ with a fixed degree $d$. Our estimate implies that the total variation mixing time of the switch chain is of order $O_d(n\log n)$. The result is optimal up to a multiple depending on $d$ and resolves a long-standing open problem. We expect that the MLSI comparison technique implemented in this paper will find further applications.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Maximal correlation and monotonicity of free entropy and Stein discrepancy
Authors:
Benjamin Dadoun,
Pierre Youssef
Abstract:
We introduce the maximal correlation coefficient $R(M_1,M_2)$ between two noncommutative probability subspaces $M_1$ and $M_2$ and show that the maximal correlation coefficient between the sub-algebras generated by $s_n:=x_1+\ldots +x_n$ and $s_m:=x_1+\ldots +x_m$ equals $\sqrt{m/n}$ for $m\le n$, where $(x_i)_{i\in \mathbb{N}}$ is a sequence of free and identically distributed noncommutative rand…
▽ More
We introduce the maximal correlation coefficient $R(M_1,M_2)$ between two noncommutative probability subspaces $M_1$ and $M_2$ and show that the maximal correlation coefficient between the sub-algebras generated by $s_n:=x_1+\ldots +x_n$ and $s_m:=x_1+\ldots +x_m$ equals $\sqrt{m/n}$ for $m\le n$, where $(x_i)_{i\in \mathbb{N}}$ is a sequence of free and identically distributed noncommutative random variables. This is the free-probability analogue of a result by Dembo--Kagan--Shepp in classical probability. As an application, we use this estimate to provide another simple proof of the monotonicity of the free entropy and free Fisher information in the free central limit theorem. Moreover, we prove that the free Stein Discrepancy introduced by Fathi and Nelson is non-increasing along the free central limit theorem.
△ Less
Submitted 8 February, 2023; v1 submitted 5 November, 2020;
originally announced November 2020.
-
Sharp Poincaré and log-Sobolev inequalities for the switch chain on regular bipartite graphs
Authors:
Konstantin Tikhomirov,
Pierre Youssef
Abstract:
Consider the switch chain on the set of $d$-regular bipartite graphs on $n$ vertices with $3\leq d\leq n^{c}$, for a small universal constant $c>0$. We prove that the chain satisfies a Poincaré inequality with a constant of order $O(nd)$; moreover, when $d$ is fixed, we establish a log-Sobolev inequality for the chain with a constant of order $O_d(n\log n)$. We show that both results are optimal.…
▽ More
Consider the switch chain on the set of $d$-regular bipartite graphs on $n$ vertices with $3\leq d\leq n^{c}$, for a small universal constant $c>0$. We prove that the chain satisfies a Poincaré inequality with a constant of order $O(nd)$; moreover, when $d$ is fixed, we establish a log-Sobolev inequality for the chain with a constant of order $O_d(n\log n)$. We show that both results are optimal. The Poincaré inequality implies that in the regime $3\leq d\leq n^c$ the mixing time of the switch chain is at most $O\big((nd)^2 \log(nd)\big)$, improving on the previously known bound $O\big((nd)^{13} \log(nd)\big)$ due to Kannan, Tetali and Vempala and $O\big(n^7d^{18} \log(nd)\big)$ obtained by Dyer et al. The log-Sobolev inequality that we establish for constant $d$ implies a bound $O(n\log^2 n)$ on the mixing time of the chain which, up to the $\log n$ factor, captures a conjectured optimal bound. Our proof strategy relies on building, for any fixed function on the set of $d$-regular bipartite simple graphs, an appropriate extension to a function on the set of multigraphs given by the configuration model. We then establish a comparison procedure with the well studied random transposition model in order to obtain the corresponding functional inequalities. While our method falls into a rich class of comparison techniques for Markov chains on different state spaces, the crucial feature of the method - dealing with chains with a large distortion between their stationary measures - is a novel addition to the theory.
△ Less
Submitted 22 May, 2022; v1 submitted 6 July, 2020;
originally announced July 2020.
-
When is ACL's Deadline? A Scientific Conversational Agent
Authors:
Mohsen Mesgar,
Paul Youssef,
Lin Li,
Dominik Bierwirth,
Yihao Li,
Christian M. Meyer,
Iryna Gurevych
Abstract:
Our conversational agent UKP-ATHENA assists NLP researchers in finding and exploring scientific literature, identifying relevant authors, planning or post-processing conference visits, and preparing paper submissions using a unified interface based on natural language inputs and responses. UKP-ATHENA enables new access paths to our swiftly evolving research area with its massive amounts of scienti…
▽ More
Our conversational agent UKP-ATHENA assists NLP researchers in finding and exploring scientific literature, identifying relevant authors, planning or post-processing conference visits, and preparing paper submissions using a unified interface based on natural language inputs and responses. UKP-ATHENA enables new access paths to our swiftly evolving research area with its massive amounts of scientific information and high turnaround times. UKP-ATHENA's responses connect information from multiple heterogeneous sources which researchers currently have to explore manually one after another. Unlike a search engine, UKP-ATHENA maintains the context of a conversation to allow for efficient information access on papers, researchers, and conferences. Our architecture consists of multiple components with reference implementations that can be easily extended by new skills and domains. Our user-based evaluation shows that UKP-ATHENA already responds 45% of different formulations of defined intents with 37% information coverage rate.
△ Less
Submitted 23 November, 2019;
originally announced November 2019.
-
Matrix Poincaré inequalities and concentration
Authors:
Richard Aoun,
Marwa Banna,
Pierre Youssef
Abstract:
We show that any probability measure satisfying a Matrix Poincaré inequality with respect to some reversible Markov generator satisfies an exponential matrix concentration inequality depending on the associated matrix carré du champ operator. This extends to the matrix setting a classical phenomenon in the scalar case. Moreover, the proof gives rise to new matrix trace inequalities which could be…
▽ More
We show that any probability measure satisfying a Matrix Poincaré inequality with respect to some reversible Markov generator satisfies an exponential matrix concentration inequality depending on the associated matrix carré du champ operator. This extends to the matrix setting a classical phenomenon in the scalar case. Moreover, the proof gives rise to new matrix trace inequalities which could be of independent interest. We then apply this general fact by establishing matrix Poincaré inequalities to derive matrix concentration inequalities for Gaussian measures, product measures and for Strong Rayleigh measures. The latter represents the first instance of matrix concentration for general matrix functions of negatively dependent random variables.
△ Less
Submitted 30 May, 2020; v1 submitted 30 October, 2019;
originally announced October 2019.
-
Outliers in spectrum of sparse Wigner matrices
Authors:
Konstantin Tikhomirov,
Pierre Youssef
Abstract:
In this paper, we study the effect of sparsity on the appearance of outliers in the semi-circular law. Let $(W_n)_{n=1}^\infty$ be a sequence of random symmetric matrices such that each $W_n$ is $n\times n$ with i.i.d entries above and on the main diagonal equidistributed with the product $b_nξ$, where $ξ$ is a real centered uniformly bounded random variable of unit variance and $b_n$ is an indepe…
▽ More
In this paper, we study the effect of sparsity on the appearance of outliers in the semi-circular law. Let $(W_n)_{n=1}^\infty$ be a sequence of random symmetric matrices such that each $W_n$ is $n\times n$ with i.i.d entries above and on the main diagonal equidistributed with the product $b_nξ$, where $ξ$ is a real centered uniformly bounded random variable of unit variance and $b_n$ is an independent Bernoulli random variable with a probability of success $p_n$. Assuming that $\lim\limits_{n\to\infty}n p_n=\infty$, we show that for the random sequence $(ρ_n)_{n=1}^\infty$ given by $$ρ_n:=θ_n+\frac{n p_n}{θ_n},\quad θ_n:=\sqrt{\max\big(\max\limits_{i\leq n}\|{\rm Row_i}(W_n)\|_2^2-np_n,n p_n\big)},$$ the ratio $\frac{\|W_n\|}{ρ_n}$ converges to one in probability. A non-centered counterpart of the theorem allows to obtain asymptotic expressions for eigenvalues of the Erdős--Renyi graphs, which were unknown in the regime $n p_n=Θ(\log n)$. In particular, denoting by $A_n$ the adjacency matrix of $\mathcal{G}(n,p_n)$ and by $λ_{|k|}(A_n)$ its $k$-th largest (by the absolute value) eigenvalue, under the assumptions $\lim\limits_{n\to\infty }n p_n=\infty$ and $\lim\limits_{n\to\infty}p_n=0$ we have:
-(No non-trivial outliers) If $\liminf\frac{n p_n}{\log n}\geq\frac{1}{\log (4/e)}$ then for any fixed $k\geq2$, $\frac{|λ_{|k|}(A_n)|}{2\sqrt{n p_n}}$ converges to $1$ in probability.
-(Outliers) If $\limsup\frac{n p_n}{\log n}<\frac{1}{\log (4/e)}$ then there is $\varepsilon>0$ such that for any $k\in\mathbb{N}$, we have $\lim\limits_{n\to\infty}\mathbb{P}\Big\{\frac{|λ_{|k|}(A_n)|}{2\sqrt{n p_n}}>1+\varepsilon\Big\}=1$.
On a conceptual level, our result highlights similarities in appearance of outliers in spectrum of sparse matrices and the so-called BBP phase transition phenomenon in deformed Wigner matrices.
△ Less
Submitted 23 May, 2019; v1 submitted 16 April, 2019;
originally announced April 2019.
-
Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks
Authors:
Steffen Eger,
Paul Youssef,
Iryna Gurevych
Abstract:
Activation functions play a crucial role in neural networks because they are the nonlinearities which have been attributed to the success story of deep learning. One of the currently most popular activation functions is ReLU, but several competitors have recently been proposed or 'discovered', including LReLU functions and swish. While most works compare newly proposed activation functions on few…
▽ More
Activation functions play a crucial role in neural networks because they are the nonlinearities which have been attributed to the success story of deep learning. One of the currently most popular activation functions is ReLU, but several competitors have recently been proposed or 'discovered', including LReLU functions and swish. While most works compare newly proposed activation functions on few tasks (usually from image classification) and against few competitors (usually ReLU), we perform the first large-scale comparison of 21 activation functions across eight different NLP tasks. We find that a largely unknown activation function performs most stably across all tasks, the so-called penalized tanh function. We also show that it can successfully replace the sigmoid and tanh gates in LSTM cells, leading to a 2 percentage point (pp) improvement over the standard choices on a challenging NLP task.
△ Less
Submitted 9 January, 2019;
originally announced January 2019.
-
The rank of random regular digraphs of constant degree
Authors:
Alexander Litvak,
Anna Lytova,
Konstantin Tikhomirov,
Nicole Tomczak-Jaegermann,
Pierre Youssef
Abstract:
Let $d$ be a fixed large integer. For any $n$ larger than $d$, let $A_n$ be the adjacency matrix of the random directed $d$-regular graph on $n$ vertices, with the uniform distribution. We show that $A_n$ has rank at least $n-1$ with probability going to one as $n$ goes to infinity. The proof combines the method of simple switchings and a recent result of the authors on delocalization of eigenvect…
▽ More
Let $d$ be a fixed large integer. For any $n$ larger than $d$, let $A_n$ be the adjacency matrix of the random directed $d$-regular graph on $n$ vertices, with the uniform distribution. We show that $A_n$ has rank at least $n-1$ with probability going to one as $n$ goes to infinity. The proof combines the method of simple switchings and a recent result of the authors on delocalization of eigenvectors of $A_n$.
△ Less
Submitted 18 July, 2018; v1 submitted 17 January, 2018;
originally announced January 2018.
-
Circular law for sparse random regular digraphs
Authors:
Alexander Litvak,
Anna Lytova,
Konstantin Tikhomirov,
Nicole Tomczak-Jaegermann,
Pierre Youssef
Abstract:
Fix a constant $C\geq 1$ and let $d=d(n)$ satisfy $d\leq \ln^{C} n$ for every large integer $n$. Denote by $A_n$ the adjacency matrix of a uniform random directed $d$-regular graph on $n$ vertices. We show that, as long as $d\to\infty$ with $n$, the empirical spectral distribution of appropriately rescaled matrix $A_n$ converges weakly in probability to the circular law. This result, together with…
▽ More
Fix a constant $C\geq 1$ and let $d=d(n)$ satisfy $d\leq \ln^{C} n$ for every large integer $n$. Denote by $A_n$ the adjacency matrix of a uniform random directed $d$-regular graph on $n$ vertices. We show that, as long as $d\to\infty$ with $n$, the empirical spectral distribution of appropriately rescaled matrix $A_n$ converges weakly in probability to the circular law. This result, together with an earlier work of Cook, completely settles the problem of weak convergence of the empirical distribution in directed $d$-regular setting with the degree tending to infinity. As a crucial element of our proof, we develop a technique of bounding intermediate singular values of $A_n$ based on studying random normals to rowspaces and on constructing a product structure to deal with the lack of independence between the matrix entries.
△ Less
Submitted 21 January, 2018; v1 submitted 17 January, 2018;
originally announced January 2018.
-
Structure of eigenvectors of random regular digraphs
Authors:
Alexander Litvak,
Anna Lytova,
Konstantin Tikhomirov,
Nicole Tomczak-Jaegermann,
Pierre Youssef
Abstract:
Let $d$ and $n$ be integers satisfying $C\leq d\leq \exp(c\sqrt{\ln n})$ for some universal constants $c, C>0$, and let $z\in \mathbb{C}$. Denote by $M$ the adjacency matrix of a random $d$-regular directed graph on $n$ vertices. In this paper, we study the structure of the kernel of submatrices of $M-z\,{\rm Id}$, formed by removing a subset of rows. We show that with large probability the kernel…
▽ More
Let $d$ and $n$ be integers satisfying $C\leq d\leq \exp(c\sqrt{\ln n})$ for some universal constants $c, C>0$, and let $z\in \mathbb{C}$. Denote by $M$ the adjacency matrix of a random $d$-regular directed graph on $n$ vertices. In this paper, we study the structure of the kernel of submatrices of $M-z\,{\rm Id}$, formed by removing a subset of rows. We show that with large probability the kernel consists of two non-intersecting types of vectors, which we call very steep and gradual with many levels. As a corollary, we show, in particular, that every eigenvector of $M$, except for constant multiples of $(1,1,\dots,1)$, possesses a weak delocalization property: its level sets have cardinality less than $Cn\ln^2 d/\ln n$. For a large constant $d$ this provides a principally new structural information on eigenvectors, implying that the number of their level sets grows to infinity with $n$. As a key technical ingredient of our proofs we introduce a decomposition of $\mathbb{C}^n$ into vectors of different degrees of `structuredness', which is an alternative to the decomposition based on the least common denominator in the regime when the underlying random matrix is very sparse.
△ Less
Submitted 25 October, 2018; v1 submitted 17 January, 2018;
originally announced January 2018.
-
The dimension-free structure of nonhomogeneous random matrices
Authors:
Rafał Latała,
Ramon van Handel,
Pierre Youssef
Abstract:
Let $X$ be a symmetric random matrix with independent but non-identically distributed centered Gaussian entries. We show that $$
\mathbf{E}\|X\|_{S_p} \asymp
\mathbf{E}\Bigg[
\Bigg(\sum_i\Bigg(\sum_j X_{ij}^2\Bigg)^{p/2}\Bigg)^{1/p}
\Bigg] $$ for any $2\le p\le\infty$, where $S_p$ denotes the $p$-Schatten class and the constants are universal. The right-hand side admits an explicit express…
▽ More
Let $X$ be a symmetric random matrix with independent but non-identically distributed centered Gaussian entries. We show that $$
\mathbf{E}\|X\|_{S_p} \asymp
\mathbf{E}\Bigg[
\Bigg(\sum_i\Bigg(\sum_j X_{ij}^2\Bigg)^{p/2}\Bigg)^{1/p}
\Bigg] $$ for any $2\le p\le\infty$, where $S_p$ denotes the $p$-Schatten class and the constants are universal. The right-hand side admits an explicit expression in terms of the variances of the matrix entries. This settles, in the case $p=\infty$, a conjecture of the first author, and provides a complete characterization of the class of infinite matrices with independent Gaussian entries that define bounded operators on $\ell_2$. Along the way, we obtain optimal dimension-free bounds on the moments $(\mathbf{E}\|X\|_{S_p}^p)^{1/p}$ that are of independent interest. We develop further extensions to non-symmetric matrices and to nonasymptotic moment and norm estimates for matrices with non-Gaussian entries that arise, for example, in the study of random graphs and in applied mathematics.
△ Less
Submitted 21 August, 2018; v1 submitted 2 November, 2017;
originally announced November 2017.
-
The smallest singular value of a shifted $d$-regular random square matrix
Authors:
Alexander Litvak,
Anna Lytova,
Konstantin Tikhomirov,
Nicole Tomczak-Jaegermann,
Pierre Youssef
Abstract:
We derive a lower bound on the smallest singular value of a random $d$-regular matrix, that is, the adjacency matrix of a random $d$-regular directed graph. More precisely, let $C_1<d< c_1 n/\log^2 n$ and let $\mathcal{M}_{n,d}$ be the set of all $0/1$-valued square $n\times n$ matrices such that each row and each column of a matrix $M\in \mathcal{M}_{n,d}$ has exactly $d$ ones. Let $M$ be uniform…
▽ More
We derive a lower bound on the smallest singular value of a random $d$-regular matrix, that is, the adjacency matrix of a random $d$-regular directed graph. More precisely, let $C_1<d< c_1 n/\log^2 n$ and let $\mathcal{M}_{n,d}$ be the set of all $0/1$-valued square $n\times n$ matrices such that each row and each column of a matrix $M\in \mathcal{M}_{n,d}$ has exactly $d$ ones. Let $M$ be uniformly distributed on $\mathcal{M}_{n,d}$. Then the smallest singular value $s_{n} (M)$ of $M$ is greater than $c_2 n^{-6}$ with probability at least $1-C_2\log^2 d/\sqrt{d}$, where $c_1$, $c_2$, $C_1$, and $C_2$ are absolute positive constants independent of any other parameters.
△ Less
Submitted 18 July, 2018; v1 submitted 9 July, 2017;
originally announced July 2017.
-
The spectral gap of dense random regular graphs
Authors:
Konstantin Tikhomirov,
Pierre Youssef
Abstract:
For any $α\in (0,1)$ and any $n^α\leq d\leq n/2$, we show that $λ(G)\leq C_α\sqrt{d}$ with probability at least $1-\frac{1}{n}$, where $G$ is the uniform random $d$-regular graph on $n$ vertices, $λ(G)$ denotes its second largest eigenvalue (in absolute value) and $C_α$ is a constant depending only on $α$. Combined with earlier results in this direction covering the case of sparse random graphs, t…
▽ More
For any $α\in (0,1)$ and any $n^α\leq d\leq n/2$, we show that $λ(G)\leq C_α\sqrt{d}$ with probability at least $1-\frac{1}{n}$, where $G$ is the uniform random $d$-regular graph on $n$ vertices, $λ(G)$ denotes its second largest eigenvalue (in absolute value) and $C_α$ is a constant depending only on $α$. Combined with earlier results in this direction covering the case of sparse random graphs, this completely settles the problem of estimating the magnitude of $λ(G)$, up to a multiplicative constant, for all values of $n$ and $d$, confirming a conjecture of Vu. The result is obtained as a consequence of an estimate for the second largest singular value of adjacency matrices of random {\it directed} graphs with predefined degree sequences. As the main technical tool, we prove a concentration inequality for arbitrary linear forms on the space of matrices, where the probability measure is induced by the adjacency matrix of a random directed graph with prescribed degree sequences. The proof is a non-trivial application of the Freedman inequality for martingales, combined with boots-trap** and tensorization arguments. Our method bears considerable differences compared to the approach used by Broder, Frieze, Suen and Upfal (1999) who established the upper bound for $λ(G)$ for $d=o(\sqrt{n})$, and to the argument of Cook, Goldstein and Johnson (2015) who derived a concentration inequality for linear forms and estimated $λ(G)$ in the range $d= O(n^{2/3})$ using size-biased couplings.
△ Less
Submitted 18 November, 2016; v1 submitted 6 October, 2016;
originally announced October 2016.
-
On the norm of a random jointly exchangeable matrix
Authors:
Konstantin Tikhomirov,
Pierre Youssef
Abstract:
In this note, we show that the norm of an $n\times n$ random jointly exchangeable matrix with zero diagonal can be estimated in terms of the norm of its $n/2\times n/2$ submatrix located in the top right corner. As a consequence, we prove a relation between the second largest singular values of a random matrix with constant row and column sums and its top right $n/2\times n/2$ submatrix. The resul…
▽ More
In this note, we show that the norm of an $n\times n$ random jointly exchangeable matrix with zero diagonal can be estimated in terms of the norm of its $n/2\times n/2$ submatrix located in the top right corner. As a consequence, we prove a relation between the second largest singular values of a random matrix with constant row and column sums and its top right $n/2\times n/2$ submatrix. The result has an application to estimating the spectral gap of random undirected $d$-regular graphs in terms of the second singular value of {\it directed} random graphs with predefined degree sequences.
△ Less
Submitted 18 November, 2016; v1 submitted 6 October, 2016;
originally announced October 2016.
-
Approximating matrices and convex bodies through Kadison-Singer
Authors:
Omer Friedland,
Pierre Youssef
Abstract:
We show that any $n\times m$ matrix $A$ can be approximated in operator norm by a submatrix with a number of columns of order the stable rank of $A$. This improves on existing results by removing an extra logarithmic factor in the size of the extracted matrix. Our proof uses the recent solution of the Kadison-Singer problem. We also develop a sort of tensorization technique to deal with constraint…
▽ More
We show that any $n\times m$ matrix $A$ can be approximated in operator norm by a submatrix with a number of columns of order the stable rank of $A$. This improves on existing results by removing an extra logarithmic factor in the size of the extracted matrix. Our proof uses the recent solution of the Kadison-Singer problem. We also develop a sort of tensorization technique to deal with constraint approximation problems. As an application, we provide a sparsification result with equal weights and an optimal approximate John's decomposition for non-symmetric convex bodies. This enables us to show that any convex body in $\mathbb{R}^n$ is arbitrary close to another one having $O(n)$ contact points and fills the gap left in the literature after the results of Rudelson and Srivastava by completely answering the problem. As a consequence, we also show that the method developed by Guédon, Gordon and Meyer to establish the isomorphic Dvoretzky theorem yields to the best known result once we inject our improvements.
△ Less
Submitted 24 January, 2017; v1 submitted 12 May, 2016;
originally announced May 2016.
-
Restricted invertibility revisited
Authors:
Assaf Naor,
Pierre Youssef
Abstract:
Suppose that $m,n\in \mathbb{N}$ and that $A:\mathbb{R}^m\to \mathbb{R}^n$ is a linear operator. It is shown here that if $k,r\in \mathbb{N}$ satisfy $k<r\le \mathrm{\bf rank(A)}$ then there exists a subset $σ\subseteq \{1,\ldots,m\}$ with $|σ|=k$ such that the restriction of $A$ to $\mathbb{R}^σ\subseteq \mathbb{R}^m$ is invertible, and moreover the operator norm of the inverse…
▽ More
Suppose that $m,n\in \mathbb{N}$ and that $A:\mathbb{R}^m\to \mathbb{R}^n$ is a linear operator. It is shown here that if $k,r\in \mathbb{N}$ satisfy $k<r\le \mathrm{\bf rank(A)}$ then there exists a subset $σ\subseteq \{1,\ldots,m\}$ with $|σ|=k$ such that the restriction of $A$ to $\mathbb{R}^σ\subseteq \mathbb{R}^m$ is invertible, and moreover the operator norm of the inverse $A^{-1}:A(\mathbb{R}^σ)\to \mathbb{R}^m$ is at most a constant multiple of the quantity $\sqrt{mr/((r-k)\sum_{i=r}^m \mathsf{s}_i(A)^2)}$, where $\mathsf{s}_1(A)\geqslant\ldots\geqslant \mathsf{s}_m(A)$ are the singular values of $A$. This improves over a series of works, starting from the seminal Bourgain--Tzafriri Restricted Invertibility Principle, through the works of Vershynin, Spielman--Srivastava and Marcus--Spielman--Srivastava. In particular, this directly implies an improved restricted invertibility principle in terms of Schatten--von Neumann norms.
△ Less
Submitted 25 November, 2016; v1 submitted 5 January, 2016;
originally announced January 2016.
-
Adjacency matrices of random digraphs: singularity and anti-concentration
Authors:
Alexander E. Litvak,
Anna Lytova,
Konstantin Tikhomirov,
Nicole Tomczak-Jaegermann,
Pierre Youssef
Abstract:
Let ${\mathcal D}_{n,d}$ be the set of all $d$-regular directed graphs on $n$ vertices. Let $G$ be a graph chosen uniformly at random from ${\mathcal D}_{n,d}$ and $M$ be its adjacency matrix. We show that $M$ is invertible with probability at least $1-C\ln^{3} d/\sqrt{d}$ for $C\leq d\leq cn/\ln^2 n$, where $c, C$ are positive absolute constants. To this end, we establish a few properties of $d$-…
▽ More
Let ${\mathcal D}_{n,d}$ be the set of all $d$-regular directed graphs on $n$ vertices. Let $G$ be a graph chosen uniformly at random from ${\mathcal D}_{n,d}$ and $M$ be its adjacency matrix. We show that $M$ is invertible with probability at least $1-C\ln^{3} d/\sqrt{d}$ for $C\leq d\leq cn/\ln^2 n$, where $c, C$ are positive absolute constants. To this end, we establish a few properties of $d$-regular directed graphs. One of them, a Littlewood-Offord type anti-concentration property, is of independent interest. Let $J$ be a subset of vertices of $G$ with $|J|\approx n/d$. Let $δ_i$ be the indicator of the event that the vertex $i$ is connected to $J$ and define $δ= (δ_1, δ_2, ..., δ_n)\in \{0, 1\}^n$. Then for every $v\in\{0,1\}^n$ the probability that $δ=v$ is exponentially small. This property holds even if a part of the graph is "frozen".
△ Less
Submitted 18 October, 2016; v1 submitted 31 October, 2015;
originally announced November 2015.
-
Bernstein type inequality for a class of dependent random matrices
Authors:
Marwa Banna,
Florence Merlevède,
Pierre Youssef
Abstract:
In this paper we obtain a Bernstein type inequality for the sum of self-adjoint centered and geometrically absolutely regular random matrices with bounded largest eigenvalue. This inequality can be viewed as an extension to the matrix setting of the Bernstein-type inequality obtained by Merlevède et al. (2009) in the context of real-valued bounded random variables that are geometrically absolutely…
▽ More
In this paper we obtain a Bernstein type inequality for the sum of self-adjoint centered and geometrically absolutely regular random matrices with bounded largest eigenvalue. This inequality can be viewed as an extension to the matrix setting of the Bernstein-type inequality obtained by Merlevède et al. (2009) in the context of real-valued bounded random variables that are geometrically absolutely regular. The proofs rely on decoupling the Laplace transform of a sum on a Cantor-like set of random matrices.
△ Less
Submitted 22 April, 2015;
originally announced April 2015.
-
Minimax of an n-dimensional Brownian motion
Authors:
Konstantin Tikhomirov,
Pierre Youssef
Abstract:
For some absolute constants $c$, $n_0$ and any $n\geq n_0$, we show that with probability close to one the convex hull of the $n$-dimensional Brownian motion ${\rm conv}\{BM_n(t):\, t\in[1,2^{cn}]\}$ does not contain the origin. The result can be interpreted as an estimate of the minimax of the Gaussian process $\{ \langle \bar{u},BM_n(t)\rangle,\, \bar{u}\in S^{n-1},\, t\in [1,2^{cn}]\}$.
For some absolute constants $c$, $n_0$ and any $n\geq n_0$, we show that with probability close to one the convex hull of the $n$-dimensional Brownian motion ${\rm conv}\{BM_n(t):\, t\in[1,2^{cn}]\}$ does not contain the origin. The result can be interpreted as an estimate of the minimax of the Gaussian process $\{ \langle \bar{u},BM_n(t)\rangle,\, \bar{u}\in S^{n-1},\, t\in [1,2^{cn}]\}$.
△ Less
Submitted 7 April, 2015;
originally announced April 2015.
-
When does a discrete-time random walk in $\mathbb{R}^n$ absorb the origin into its convex hull?
Authors:
Konstantin Tikhomirov,
Pierre Youssef
Abstract:
We connect this question to a problem of estimating the probability that the image of certain random matrices does not intersect with a subset of the unit sphere $\mathbb{S}^{n-1}$. In this way, the case of a discretized Brownian motion is related to Gordon's escape theorem dealing with standard Gaussian matrices. The approach allows us to prove that with high probability, the $π/2$-covering time…
▽ More
We connect this question to a problem of estimating the probability that the image of certain random matrices does not intersect with a subset of the unit sphere $\mathbb{S}^{n-1}$. In this way, the case of a discretized Brownian motion is related to Gordon's escape theorem dealing with standard Gaussian matrices. The approach allows us to prove that with high probability, the $π/2$-covering time of certain random walks on $\mathbb{S}^{n-1}$ is of order $n$. For certain spherical simplices on $\mathbb{S}^{n-1}$, we extend the "escape" phenomenon to a broad class of random matrices; as an application, we show that $e^{Cn}$ steps are sufficient for the standard walk on $\mathbb{Z}^n$ to absorb the origin into its convex hull with a high probability.
△ Less
Submitted 1 November, 2015; v1 submitted 2 October, 2014;
originally announced October 2014.
-
Extracting a basis with fixed block inside a matrix
Authors:
Pierre Youssef
Abstract:
Given $U$ an $n\times m$ matrix of rank $n$ and $V$ block of columns inside $U$, we consider the problem of extracting a block of columns of rank $n$ which minimize the Hilbert-Schmidt norm of the inverse while preserving the block $V$. This generalizes a previous result of Gluskin-Olevskii, and improves the estimates when given a "good" block $V$.
Given $U$ an $n\times m$ matrix of rank $n$ and $V$ block of columns inside $U$, we consider the problem of extracting a block of columns of rank $n$ which minimize the Hilbert-Schmidt norm of the inverse while preserving the block $V$. This generalizes a previous result of Gluskin-Olevskii, and improves the estimates when given a "good" block $V$.
△ Less
Submitted 13 November, 2015; v1 submitted 24 January, 2014;
originally announced January 2014.
-
Estimating the covariance of random matrices
Authors:
Pierre Youssef
Abstract:
We extend to the matrix setting a recent result of Srivastava-Vershynin about estimating the covariance matrix of a random vector. The result can be in- terpreted as a quantified version of the law of large numbers for positive semi-definite matrices which verify some regularity assumption. Beside giving examples, we dis- cuss the notion of log-concave matrices and give estimates on the smallest a…
▽ More
We extend to the matrix setting a recent result of Srivastava-Vershynin about estimating the covariance matrix of a random vector. The result can be in- terpreted as a quantified version of the law of large numbers for positive semi-definite matrices which verify some regularity assumption. Beside giving examples, we dis- cuss the notion of log-concave matrices and give estimates on the smallest and largest eigenvalues of a sum of such matrices.
△ Less
Submitted 5 December, 2013; v1 submitted 28 January, 2013;
originally announced January 2013.
-
A note on column subset selection
Authors:
Pierre Youssef
Abstract:
Given a matrix U, using a deterministic method, we extract a "large" submatrix of U'(whose columns are obtained by normalizing those of U) and estimate its smallest and largest singular value. We apply this result to the study of contact points of the unit ball with its maximal volume ellipsoid. We consider also the paving problem and give a deterministic algorithm to partition a matrix into almos…
▽ More
Given a matrix U, using a deterministic method, we extract a "large" submatrix of U'(whose columns are obtained by normalizing those of U) and estimate its smallest and largest singular value. We apply this result to the study of contact points of the unit ball with its maximal volume ellipsoid. We consider also the paving problem and give a deterministic algorithm to partition a matrix into almost isometric blocks recovering previous results of Bourgain-Tzafriri and Tropp. Finally, we partially answer a question raised by Naor about finding an algorithm in the spirit of Batson-Spielman-Srivastava's work to extract a "large" square submatrix of "small" norm.
△ Less
Submitted 25 October, 2013; v1 submitted 5 December, 2012;
originally announced December 2012.
-
Restricted Invertibility and the Banach-Mazur distance to the cube
Authors:
Pierre Youssef
Abstract:
We prove a normalized version of the restricted invertibility principle obtained by Spielman-Srivastava. Applying this result, we get a new proof of the proportional Dvoretzky-Rogers factorization theorem recovering the best current estimate. As a consequence, we also recover the best known estimate for the Banach-Mazur distance to the cube: the distance of every n-dimensional normed space from \e…
▽ More
We prove a normalized version of the restricted invertibility principle obtained by Spielman-Srivastava. Applying this result, we get a new proof of the proportional Dvoretzky-Rogers factorization theorem recovering the best current estimate. As a consequence, we also recover the best known estimate for the Banach-Mazur distance to the cube: the distance of every n-dimensional normed space from \ell_{\infty}^n is at most (2n)^(5/6). Finally, using tools from the work of Batson-Spielman-Srivastava, we give a new proof for a theorem of Kashin-Tzafriri on the norm of restricted matrices.
△ Less
Submitted 12 June, 2013; v1 submitted 4 June, 2012;
originally announced June 2012.
-
Expected Performance of the ATLAS Experiment - Detector, Trigger and Physics
Authors:
The ATLAS Collaboration,
G. Aad,
E. Abat,
B. Abbott,
J. Abdallah,
A. A. Abdelalim,
A. Abdesselam,
O. Abdinov,
B. Abi,
M. Abolins,
H. Abramowicz,
B. S. Acharya,
D. L. Adams,
T. N. Addy,
C. Adorisio,
P. Adragna,
T. Adye,
J. A. Aguilar-Saavedra,
M. Aharrouche,
S. P. Ahlen,
F. Ahles,
A. Ahmad,
H. Ahmed,
G. Aielli,
T. Akdogan
, et al. (2587 additional authors not shown)
Abstract:
A detailed study is presented of the expected performance of the ATLAS detector. The reconstruction of tracks, leptons, photons, missing energy and jets is investigated, together with the performance of b-tagging and the trigger. The physics potential for a variety of interesting physics processes, within the Standard Model and beyond, is examined. The study comprises a series of notes based on…
▽ More
A detailed study is presented of the expected performance of the ATLAS detector. The reconstruction of tracks, leptons, photons, missing energy and jets is investigated, together with the performance of b-tagging and the trigger. The physics potential for a variety of interesting physics processes, within the Standard Model and beyond, is examined. The study comprises a series of notes based on simulations of the detector and physics processes, with particular emphasis given to the data expected from the first years of operation of the LHC at CERN.
△ Less
Submitted 14 August, 2009; v1 submitted 28 December, 2008;
originally announced January 2009.