Search | arXiv e-print repository

Detecting Edited Knowledge in Language Models

Authors: Paul Youssef, Zhixue Zhao, Jörg Schlötterer, Christin Seifert

Abstract: Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models and provide more transpa… ▽ More Knowledge editing methods (KEs) can update language models' obsolete or inaccurate knowledge learned from pre-training. However, KEs can be used for malicious applications, e.g., inserting misinformation and toxic content. Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users' trust in generative models and provide more transparency. Driven by this, we propose a novel task: detecting edited knowledge in language models. Given an edited model and a fact retrieved by a prompt from an edited model, the objective is to classify the knowledge as either unedited (based on the pre-training), or edited (based on subsequent editing). We instantiate the task with four KEs, two LLMs, and two datasets. Additionally, we propose using the hidden state representations and the probability distributions as features for the detection. Our results reveal that, using these features as inputs to a simple AdaBoost classifiers establishes a strong baseline. This classifier requires only a limited amount of data and maintains its performance even in cross-domain settings. Last, we find it more challenging to distinguish edited knowledge from unedited but related knowledge, highlighting the need for further research. Our work lays the groundwork for addressing malicious model editing, which is a critical challenge associated with the strong generative capabilities of LLMs. △ Less

Submitted 1 July, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

arXiv:2405.00722 [pdf, other]

LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study

Authors: Van Bach Nguyen, Paul Youssef, Jörg Schlötterer, Christin Seifert

Abstract: As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how… ▽ More As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for two NLU tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than NLI where LLMs show weaknesses in generating CFs that flip the original label. This also reflects on the data augmentation performance, where we observe a large gap between augmenting with human and LLMs CFs. Furthermore, we evaluate LLMs' ability to assess CFs in a mislabelled data setting, and show that they have a strong bias towards agreeing with the provided labels. GPT4 is more robust against this bias and its scores correlate well with automatic metrics. Our findings reveal several limitations and point to potential future work directions. △ Less

Submitted 26 April, 2024; originally announced May 2024.

arXiv:2404.19662 [pdf, ps, other]

Central Limit Theorem for tensor products of free variables

Authors: Cécilia Lancien, Patrick Oliveira Santos, Pierre Youssef

Abstract: We establish a central limit theorem for tensor product random variables $c_k:=a_k \otimes a_k$, where $(a_k)_{k \in \mathbb{N}}$ is a free family of variables. We show that if the variables $a_k$ are centered, the limiting law is the semi-circle. Otherwise, the limiting law depends on the mean and variance of the variables $a_k$ and corresponds to a free interpolation between the semi-circle law… ▽ More We establish a central limit theorem for tensor product random variables $c_k:=a_k \otimes a_k$, where $(a_k)_{k \in \mathbb{N}}$ is a free family of variables. We show that if the variables $a_k$ are centered, the limiting law is the semi-circle. Otherwise, the limiting law depends on the mean and variance of the variables $a_k$ and corresponds to a free interpolation between the semi-circle law and the classical convolution of two semi-circle laws. △ Less

Submitted 30 April, 2024; originally announced April 2024.

MSC Class: 46L54; 60F05; 47A80; 60B20

arXiv:2404.05090 [pdf, other]

How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse

Authors: Mohamed El Amine Seddik, Suei-Wen Chen, Soufiane Hayou, Pierre Youssef, Merouane Debbah

Abstract: The phenomenon of model collapse, introduced in (Shumailov et al., 2023), refers to the deterioration in performance that occurs when new models are trained on synthetic data generated from previously trained models. This recursive training loop makes the tails of the original distribution disappear, thereby making future-generation models forget about the initial (real) distribution. With the aim… ▽ More The phenomenon of model collapse, introduced in (Shumailov et al., 2023), refers to the deterioration in performance that occurs when new models are trained on synthetic data generated from previously trained models. This recursive training loop makes the tails of the original distribution disappear, thereby making future-generation models forget about the initial (real) distribution. With the aim of rigorously understanding model collapse in language models, we consider in this paper a statistical model that allows us to characterize the impact of various recursive training scenarios. Specifically, we demonstrate that model collapse cannot be avoided when training solely on synthetic data. However, when mixing both real and synthetic data, we provide an estimate of a maximal amount of synthetic data below which model collapse can eventually be avoided. Our theoretical conclusions are further supported by empirical validations. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2402.01453 [pdf, other]

The Queen of England is not England's Queen: On the Lack of Factual Coherency in PLMs

Authors: Paul Youssef, Jörg Schlötterer, Christin Seifert

Abstract: Factual knowledge encoded in Pre-trained Language Models (PLMs) enriches their representations and justifies their use as knowledge bases. Previous work has focused on probing PLMs for factual knowledge by measuring how often they can correctly predict an object entity given a subject and a relation, and improving fact retrieval by optimizing the prompts used for querying PLMs. In this work, we co… ▽ More Factual knowledge encoded in Pre-trained Language Models (PLMs) enriches their representations and justifies their use as knowledge bases. Previous work has focused on probing PLMs for factual knowledge by measuring how often they can correctly predict an object entity given a subject and a relation, and improving fact retrieval by optimizing the prompts used for querying PLMs. In this work, we consider a complementary aspect, namely the coherency of factual knowledge in PLMs, i.e., how often can PLMs predict the subject entity given its initial prediction of the object entity. This goes beyond evaluating how much PLMs know, and focuses on the internal state of knowledge inside them. Our results indicate that PLMs have low coherency using manually written, optimized and paraphrased prompts, but including an evidence paragraph leads to substantial improvement. This shows that PLMs fail to model inverse relations and need further enhancements to be able to handle retrieving facts from their parameters in a coherent manner, and to be considered as knowledge bases. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted to EACL Findings 2024

arXiv:2401.07852 [pdf, ps, other]

On spectral outliers of inhomogeneous symmetric random matrices

Authors: Dylan J. Altschuler, Patrick Oliveira Santos, Konstantin Tikhomirov, Pierre Youssef

Abstract: Sharp conditions for the presence of spectral outliers are well understood for Wigner random matrices with iid entries. In the setting of inhomogeneous symmetric random matrices (i.e., matrices with a non-trivial variance profile), the corresponding problem has been considered only recently. Of special interest is the setting of sparse inhomogeneous matrices since sparsity is both a key feature an… ▽ More Sharp conditions for the presence of spectral outliers are well understood for Wigner random matrices with iid entries. In the setting of inhomogeneous symmetric random matrices (i.e., matrices with a non-trivial variance profile), the corresponding problem has been considered only recently. Of special interest is the setting of sparse inhomogeneous matrices since sparsity is both a key feature and a technical obstacle in various aspects of random matrix theory. For such matrices, the largest of the variances of the entries has been used in the literature as a natural proxy for sparsity. We contribute sharp conditions in terms of this parameter for an inhomogeneous symmetric matrix with sub-Gaussian entries to have outliers. Our result implies a ``structural'' universality principle: the presence of outliers is only determined by the level of sparsity, rather than the detailed structure of the variance profile. △ Less

Submitted 25 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

arXiv:2311.12368 [pdf, ps, other]

Limiting spectral distribution of random self-adjoint quantum channels

Authors: Cécilia Lancien, Patrick Oliveira Santos, Pierre Youssef

Abstract: We study the limiting spectral distribution of quantum channels whose Kraus operators are sampled as $n\times n$ random Hermitian matrices satisfying certain assumptions. We show that when the Kraus rank goes to infinity with n, the limiting spectral distribution (suitably rescaled) of the corresponding quantum channel coincides with the semi-circle distribution. When the Kraus rank is fixed, the… ▽ More We study the limiting spectral distribution of quantum channels whose Kraus operators are sampled as $n\times n$ random Hermitian matrices satisfying certain assumptions. We show that when the Kraus rank goes to infinity with n, the limiting spectral distribution (suitably rescaled) of the corresponding quantum channel coincides with the semi-circle distribution. When the Kraus rank is fixed, the limiting spectral distribution is no longer the semi-circle distribution. It corresponds to an explicit law, which can also be described using tools from free probability. △ Less

Submitted 21 November, 2023; originally announced November 2023.

MSC Class: 81P45; 81P47; 60B20; 15B52; 46L54

arXiv:2310.16570 [pdf, other]

Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained Language Models

Authors: Paul Youssef, Osman Alperen Koraş, Meijie Li, Jörg Schlötterer, Christin Seifert

Abstract: Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge. This fact has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs, as this explains their performance on downstream tasks, and potentially justifies their use as knowledge bases. In this work, we survey methods and datasets that are used to probe PLMs for… ▽ More Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge. This fact has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs, as this explains their performance on downstream tasks, and potentially justifies their use as knowledge bases. In this work, we survey methods and datasets that are used to probe PLMs for factual knowledge. Our contributions are: (1) We propose a categorization scheme for factual probing methods that is based on how their inputs, outputs and the probed PLMs are adapted; (2) We provide an overview of the datasets used for factual probing; (3) We synthesize insights about knowledge retention and prompt optimization in PLMs, analyze obstacles to adopting PLMs as knowledge bases and outline directions for future work. △ Less

Submitted 4 December, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted at EMNLP Findings 2023

arXiv:2307.12803 [pdf, other]

Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis

Authors: Jan Trienes, Paul Youssef, Jörg Schlötterer, Christin Seifert

Abstract: Automatically summarizing radiology reports into a concise impression can reduce the manual burden of clinicians and improve the consistency of reporting. Previous work aimed to enhance content selection and factuality through guided abstractive summarization. However, two key issues persist. First, current methods heavily rely on domain-specific resources to extract the guidance signal, limiting… ▽ More Automatically summarizing radiology reports into a concise impression can reduce the manual burden of clinicians and improve the consistency of reporting. Previous work aimed to enhance content selection and factuality through guided abstractive summarization. However, two key issues persist. First, current methods heavily rely on domain-specific resources to extract the guidance signal, limiting their transferability to domains and languages where those resources are unavailable. Second, while automatic metrics like ROUGE show progress, we lack a good understanding of the errors and failure modes in this task. To bridge these gaps, we first propose a domain-agnostic guidance signal in form of variable-length extractive summaries. Our empirical results on two English benchmarks demonstrate that this guidance signal improves upon unguided summarization while being competitive with domain-specific methods. Additionally, we run an expert evaluation of four systems according to a taxonomy of 11 fine-grained errors. We find that the most pressing differences between automatic summaries and those of radiologists relate to content selection including omissions (up to 52%) and additions (up to 57%). We hypothesize that latent reporting factors and corpus-level inconsistencies may limit models to reliably learn content selection from the available data, presenting promising directions for future work. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: Accepted at INLG2023

arXiv:2304.06465 [pdf, ps, other]

Flat bands of periodic graphs

Authors: Mostafa Sabri, Pierre Youssef

Abstract: We study flat bands of periodic graphs in a Euclidean space. These are infinitely degenerate eigenvalues of the corresponding adjacency matrix, with eigenvectors of compact support. We provide some optimal recipes to generate desired bands, some sufficient conditions for a graph to have flat bands, we characterize the set of flat bands whose eigenvectors occupy a single cell and we compute the lis… ▽ More We study flat bands of periodic graphs in a Euclidean space. These are infinitely degenerate eigenvalues of the corresponding adjacency matrix, with eigenvectors of compact support. We provide some optimal recipes to generate desired bands, some sufficient conditions for a graph to have flat bands, we characterize the set of flat bands whose eigenvectors occupy a single cell and we compute the list of such bands for small cells. We next discuss stability and rarity of flat bands in special cases. Additional folklore results are proved and many questions are still open. △ Less

Submitted 27 April, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: 26 pages, 19 figures

MSC Class: 81Q10; 05C50

arXiv:2302.07772 [pdf, ps, other]

A note on quantum expanders

Authors: Cécilia Lancien, Pierre Youssef

Abstract: We prove that a wide class of random quantum channels with few Kraus operators, sampled as random matrices with some moment assumptions, exhibit a large spectral gap, and are therefore optimal quantum expanders. In particular, our result provides a recipe to construct random quantum expanders from their classical (random or deterministic) counterparts. This considerably enlarges the list of known… ▽ More We prove that a wide class of random quantum channels with few Kraus operators, sampled as random matrices with some moment assumptions, exhibit a large spectral gap, and are therefore optimal quantum expanders. In particular, our result provides a recipe to construct random quantum expanders from their classical (random or deterministic) counterparts. This considerably enlarges the list of known constructions of optimal quantum expanders, which was previously limited to few examples. Our proofs rely on recent progress in the study of the operator norm of random matrices with dependence and non-homogeneity, which we expect to have further applications in several areas of quantum information. △ Less

Submitted 23 February, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: 17 pages

MSC Class: 81P45; 81P47; 60B20; 15B52

arXiv:2212.06090 [pdf, other]

Monotonicity of the logarithmic energy for random matrices

Authors: Djalil Chafaï, Benjamin Dadoun, Pierre Youssef

Abstract: It is well-known that the semi-circle law, which is the limiting distribution in the Wigner theorem, is the minimizer of the logarithmic energy penalized by the second moment. A very similar fact holds for the Girko and Marchenko--Pastur theorems. In this work, we shed the light on an intriguing phenomenon suggesting that this functional is monotonic along the mean empirical spectral distribution… ▽ More It is well-known that the semi-circle law, which is the limiting distribution in the Wigner theorem, is the minimizer of the logarithmic energy penalized by the second moment. A very similar fact holds for the Girko and Marchenko--Pastur theorems. In this work, we shed the light on an intriguing phenomenon suggesting that this functional is monotonic along the mean empirical spectral distribution in terms of the matrix dimension. This is reminiscent of the monotonicity of the Boltzmann entropy along the Boltzmann equation, the monotonicity of the free energy along ergodic Markov processes, and the Shannon monotonicity of entropy or free entropy along the classical or free central limit theorem. While we only verify this monotonicity phenomenon for the Gaussian unitary ensemble, the complex Ginibre ensemble, and the square Laguerre unitary ensemble, numerical simulations suggest that it is actually more universal. We obtain along the way explicit formulas of the logarithmic energy of the mentioned models which can be of independent interest. △ Less

Submitted 8 April, 2024; v1 submitted 12 December, 2022; originally announced December 2022.

Comments: To appear in Random Matrices: Theory and Applications

MSC Class: 60B20; 15B52

arXiv:2212.06028 [pdf, ps, other]

Upgrading MLSI to LSI for reversible Markov chains

Authors: Justin Salez, Konstantin Tikhomirov, Pierre Youssef

Abstract: For reversible Markov chains on finite state spaces, we show that the modified log-Sobolev inequality (MLSI) can be upgraded to a log-Sobolev inequality (LSI) at the surprisingly low cost of degrading the associated constant by $\log (1/p)$, where $p$ is the minimum non-zero transition probability. We illustrate this by providing the first log-Sobolev estimate for Zero-Range processes on arbitrary… ▽ More For reversible Markov chains on finite state spaces, we show that the modified log-Sobolev inequality (MLSI) can be upgraded to a log-Sobolev inequality (LSI) at the surprisingly low cost of degrading the associated constant by $\log (1/p)$, where $p$ is the minimum non-zero transition probability. We illustrate this by providing the first log-Sobolev estimate for Zero-Range processes on arbitrary graphs. As another application, we determine the modified log-Sobolev constant of the Lamplighter chain on all bounded-degree graphs, and use it to provide negative answers to two open questions by Montenegro and Tetali (2006) and Hermon and Peres (2018). Our proof builds upon the `regularization trick' recently introduced by the last two authors. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: 17 pages, comments welcome!

MSC Class: 60J27 (Primary) 60J46; 46E39 (Secondary)

arXiv:2208.00522 [pdf, other]

Online Decentralized Frank-Wolfe: From theoretical bound to applications in smart-building

Authors: Angan Mitra, Nguyen Kim Thang, Tuan-Anh Nguyen, Denis Trystram, Paul Youssef

Abstract: The design of decentralized learning algorithms is important in the fast-growing world in which data are distributed over participants with limited local computation resources and communication. In this direction, we propose an online algorithm minimizing non-convex loss functions aggregated from individual data/models distributed over a network. We provide the theoretical performance guarantee of… ▽ More The design of decentralized learning algorithms is important in the fast-growing world in which data are distributed over participants with limited local computation resources and communication. In this direction, we propose an online algorithm minimizing non-convex loss functions aggregated from individual data/models distributed over a network. We provide the theoretical performance guarantee of our algorithm and demonstrate its utility on a real life smart building. △ Less

Submitted 31 July, 2022; originally announced August 2022.

arXiv:2207.02057 [pdf, other]

Online 2-stage Stable Matching

Authors: Evripidis Bampis, Bruno Escoffier, Paul Youssef

Abstract: We focus on an online 2-stage problem, motivated by the following situation: consider a system where students shall be assigned to universities. There is a first round where some students apply, and a first (stable) matching $M_1$ has to be computed. However, some students may decide to leave the system (change their plan, go to a foreign university, or to some institution not in the system). Then… ▽ More We focus on an online 2-stage problem, motivated by the following situation: consider a system where students shall be assigned to universities. There is a first round where some students apply, and a first (stable) matching $M_1$ has to be computed. However, some students may decide to leave the system (change their plan, go to a foreign university, or to some institution not in the system). Then, in a second round (after these deletions), we shall compute a second (final) stable matching $M_2$. As it is undesirable to change assignments, the goal is to minimize the number of divorces/modifications between the two stable matchings $M_1$ and $M_2$. Then, how should we choose $M_1$ and $M_2$? We show that there is an {\it optimal online} algorithm to solve this problem. In particular, thanks to a dominance property, we show that we can optimally compute $M_1$ without knowing the students that will leave the system. We generalize the result to some other possible modifications in the input (students, open positions). We also tackle the case of more stages, showing that no competitive (online) algorithm can be achieved for the considered problem as soon as there are 3 stages. △ Less

Submitted 2 May, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

arXiv:2206.12477 [pdf, ps, other]

Regularized modified log-Sobolev inequalities, and comparison of Markov chains

Authors: Konstantin Tikhomirov, Pierre Youssef

Abstract: In this work, we develop a comparison procedure for the Modified log-Sobolev Inequality (MLSI) constants of two reversible Markov chains on a finite state space. Efficient comparison of the MLSI Dirichlet forms is a well known obstacle in the theory of Markov chains. We approach this problem by introducing a {\it regularized} MLSI constant which, under some assumptions, has the same order of magni… ▽ More In this work, we develop a comparison procedure for the Modified log-Sobolev Inequality (MLSI) constants of two reversible Markov chains on a finite state space. Efficient comparison of the MLSI Dirichlet forms is a well known obstacle in the theory of Markov chains. We approach this problem by introducing a {\it regularized} MLSI constant which, under some assumptions, has the same order of magnitude as the usual MLSI constant yet is amenable for comparison and thus considerably simpler to estimate in certain cases. As an application of this general comparison procedure, we provide a sharp estimate of the MLSI constant of the switch chain on the the set of simple bipartite regular graphs of size $n$ with a fixed degree $d$. Our estimate implies that the total variation mixing time of the switch chain is of order $O_d(n\log n)$. The result is optimal up to a multiple depending on $d$ and resolves a long-standing open problem. We expect that the MLSI comparison technique implemented in this paper will find further applications. △ Less

Submitted 24 June, 2022; originally announced June 2022.

arXiv:2011.03045 [pdf, ps, other]

Maximal correlation and monotonicity of free entropy and Stein discrepancy

Authors: Benjamin Dadoun, Pierre Youssef

Abstract: We introduce the maximal correlation coefficient $R(M_1,M_2)$ between two noncommutative probability subspaces $M_1$ and $M_2$ and show that the maximal correlation coefficient between the sub-algebras generated by $s_n:=x_1+\ldots +x_n$ and $s_m:=x_1+\ldots +x_m$ equals $\sqrt{m/n}$ for $m\le n$, where $(x_i)_{i\in \mathbb{N}}$ is a sequence of free and identically distributed noncommutative rand… ▽ More We introduce the maximal correlation coefficient $R(M_1,M_2)$ between two noncommutative probability subspaces $M_1$ and $M_2$ and show that the maximal correlation coefficient between the sub-algebras generated by $s_n:=x_1+\ldots +x_n$ and $s_m:=x_1+\ldots +x_m$ equals $\sqrt{m/n}$ for $m\le n$, where $(x_i)_{i\in \mathbb{N}}$ is a sequence of free and identically distributed noncommutative random variables. This is the free-probability analogue of a result by Dembo--Kagan--Shepp in classical probability. As an application, we use this estimate to provide another simple proof of the monotonicity of the free entropy and free Fisher information in the free central limit theorem. Moreover, we prove that the free Stein Discrepancy introduced by Fathi and Nelson is non-increasing along the free central limit theorem. △ Less

Submitted 8 February, 2023; v1 submitted 5 November, 2020; originally announced November 2020.

Journal ref: Electron. Commun. Probab. 26: 1-10 (2021)

arXiv:2007.02729 [pdf, other]

Sharp Poincaré and log-Sobolev inequalities for the switch chain on regular bipartite graphs

Authors: Konstantin Tikhomirov, Pierre Youssef

Abstract: Consider the switch chain on the set of $d$-regular bipartite graphs on $n$ vertices with $3\leq d\leq n^{c}$, for a small universal constant $c>0$. We prove that the chain satisfies a Poincaré inequality with a constant of order $O(nd)$; moreover, when $d$ is fixed, we establish a log-Sobolev inequality for the chain with a constant of order $O_d(n\log n)$. We show that both results are optimal.… ▽ More Consider the switch chain on the set of $d$-regular bipartite graphs on $n$ vertices with $3\leq d\leq n^{c}$, for a small universal constant $c>0$. We prove that the chain satisfies a Poincaré inequality with a constant of order $O(nd)$; moreover, when $d$ is fixed, we establish a log-Sobolev inequality for the chain with a constant of order $O_d(n\log n)$. We show that both results are optimal. The Poincaré inequality implies that in the regime $3\leq d\leq n^c$ the mixing time of the switch chain is at most $O\big((nd)^2 \log(nd)\big)$, improving on the previously known bound $O\big((nd)^{13} \log(nd)\big)$ due to Kannan, Tetali and Vempala and $O\big(n^7d^{18} \log(nd)\big)$ obtained by Dyer et al. The log-Sobolev inequality that we establish for constant $d$ implies a bound $O(n\log^2 n)$ on the mixing time of the chain which, up to the $\log n$ factor, captures a conjectured optimal bound. Our proof strategy relies on building, for any fixed function on the set of $d$-regular bipartite simple graphs, an appropriate extension to a function on the set of multigraphs given by the configuration model. We then establish a comparison procedure with the well studied random transposition model in order to obtain the corresponding functional inequalities. While our method falls into a rich class of comparison techniques for Markov chains on different state spaces, the crucial feature of the method - dealing with chains with a large distortion between their stationary measures - is a novel addition to the theory. △ Less

Submitted 22 May, 2022; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: revision

arXiv:1911.10392 [pdf, other]

When is ACL's Deadline? A Scientific Conversational Agent

Authors: Mohsen Mesgar, Paul Youssef, Lin Li, Dominik Bierwirth, Yihao Li, Christian M. Meyer, Iryna Gurevych

Abstract: Our conversational agent UKP-ATHENA assists NLP researchers in finding and exploring scientific literature, identifying relevant authors, planning or post-processing conference visits, and preparing paper submissions using a unified interface based on natural language inputs and responses. UKP-ATHENA enables new access paths to our swiftly evolving research area with its massive amounts of scienti… ▽ More Our conversational agent UKP-ATHENA assists NLP researchers in finding and exploring scientific literature, identifying relevant authors, planning or post-processing conference visits, and preparing paper submissions using a unified interface based on natural language inputs and responses. UKP-ATHENA enables new access paths to our swiftly evolving research area with its massive amounts of scientific information and high turnaround times. UKP-ATHENA's responses connect information from multiple heterogeneous sources which researchers currently have to explore manually one after another. Unlike a search engine, UKP-ATHENA maintains the context of a conversation to allow for efficient information access on papers, researchers, and conferences. Our architecture consists of multiple components with reference implementations that can be easily extended by new skills and domains. Our user-based evaluation shows that UKP-ATHENA already responds 45% of different formulations of defined intents with 37% information coverage rate. △ Less

Submitted 23 November, 2019; originally announced November 2019.

arXiv:1910.13797 [pdf, ps, other]

Matrix Poincaré inequalities and concentration

Authors: Richard Aoun, Marwa Banna, Pierre Youssef

Abstract: We show that any probability measure satisfying a Matrix Poincaré inequality with respect to some reversible Markov generator satisfies an exponential matrix concentration inequality depending on the associated matrix carré du champ operator. This extends to the matrix setting a classical phenomenon in the scalar case. Moreover, the proof gives rise to new matrix trace inequalities which could be… ▽ More We show that any probability measure satisfying a Matrix Poincaré inequality with respect to some reversible Markov generator satisfies an exponential matrix concentration inequality depending on the associated matrix carré du champ operator. This extends to the matrix setting a classical phenomenon in the scalar case. Moreover, the proof gives rise to new matrix trace inequalities which could be of independent interest. We then apply this general fact by establishing matrix Poincaré inequalities to derive matrix concentration inequalities for Gaussian measures, product measures and for Strong Rayleigh measures. The latter represents the first instance of matrix concentration for general matrix functions of negatively dependent random variables. △ Less

Submitted 30 May, 2020; v1 submitted 30 October, 2019; originally announced October 2019.

Comments: Final version, to appear in Advances in Mathematics

MSC Class: 60B20; 15A39; 47A63; 60J25

arXiv:1904.07985 [pdf, other]

Outliers in spectrum of sparse Wigner matrices

Authors: Konstantin Tikhomirov, Pierre Youssef

Abstract: In this paper, we study the effect of sparsity on the appearance of outliers in the semi-circular law. Let $(W_n)_{n=1}^\infty$ be a sequence of random symmetric matrices such that each $W_n$ is $n\times n$ with i.i.d entries above and on the main diagonal equidistributed with the product $b_nξ$, where $ξ$ is a real centered uniformly bounded random variable of unit variance and $b_n$ is an indepe… ▽ More In this paper, we study the effect of sparsity on the appearance of outliers in the semi-circular law. Let $(W_n)_{n=1}^\infty$ be a sequence of random symmetric matrices such that each $W_n$ is $n\times n$ with i.i.d entries above and on the main diagonal equidistributed with the product $b_nξ$, where $ξ$ is a real centered uniformly bounded random variable of unit variance and $b_n$ is an independent Bernoulli random variable with a probability of success $p_n$. Assuming that $\lim\limits_{n\to\infty}n p_n=\infty$, we show that for the random sequence $(ρ_n)_{n=1}^\infty$ given by $$ρ_n:=θ_n+\frac{n p_n}{θ_n},\quad θ_n:=\sqrt{\max\big(\max\limits_{i\leq n}\|{\rm Row_i}(W_n)\|_2^2-np_n,n p_n\big)},$$ the ratio $\frac{\|W_n\|}{ρ_n}$ converges to one in probability. A non-centered counterpart of the theorem allows to obtain asymptotic expressions for eigenvalues of the Erdős--Renyi graphs, which were unknown in the regime $n p_n=Θ(\log n)$. In particular, denoting by $A_n$ the adjacency matrix of $\mathcal{G}(n,p_n)$ and by $λ_{|k|}(A_n)$ its $k$-th largest (by the absolute value) eigenvalue, under the assumptions $\lim\limits_{n\to\infty }n p_n=\infty$ and $\lim\limits_{n\to\infty}p_n=0$ we have: -(No non-trivial outliers) If $\liminf\frac{n p_n}{\log n}\geq\frac{1}{\log (4/e)}$ then for any fixed $k\geq2$, $\frac{|λ_{|k|}(A_n)|}{2\sqrt{n p_n}}$ converges to $1$ in probability. -(Outliers) If $\limsup\frac{n p_n}{\log n}<\frac{1}{\log (4/e)}$ then there is $\varepsilon>0$ such that for any $k\in\mathbb{N}$, we have $\lim\limits_{n\to\infty}\mathbb{P}\Big\{\frac{|λ_{|k|}(A_n)|}{2\sqrt{n p_n}}>1+\varepsilon\Big\}=1$. On a conceptual level, our result highlights similarities in appearance of outliers in spectrum of sparse matrices and the so-called BBP phase transition phenomenon in deformed Wigner matrices. △ Less

Submitted 23 May, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

Comments: Added reference to the related work arXiv:1905.03243

MSC Class: 15B52; 05C80; 60C05

arXiv:1901.02671 [pdf, other]

Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks

Authors: Steffen Eger, Paul Youssef, Iryna Gurevych

Abstract: Activation functions play a crucial role in neural networks because they are the nonlinearities which have been attributed to the success story of deep learning. One of the currently most popular activation functions is ReLU, but several competitors have recently been proposed or 'discovered', including LReLU functions and swish. While most works compare newly proposed activation functions on few… ▽ More Activation functions play a crucial role in neural networks because they are the nonlinearities which have been attributed to the success story of deep learning. One of the currently most popular activation functions is ReLU, but several competitors have recently been proposed or 'discovered', including LReLU functions and swish. While most works compare newly proposed activation functions on few tasks (usually from image classification) and against few competitors (usually ReLU), we perform the first large-scale comparison of 21 activation functions across eight different NLP tasks. We find that a largely unknown activation function performs most stably across all tasks, the so-called penalized tanh function. We also show that it can successfully replace the sigmoid and tanh gates in LSTM cells, leading to a 2 percentage point (pp) improvement over the standard choices on a challenging NLP task. △ Less

Submitted 9 January, 2019; originally announced January 2019.

Comments: Published at EMNLP 2018

arXiv:1801.05577 [pdf, ps, other]

The rank of random regular digraphs of constant degree

Authors: Alexander Litvak, Anna Lytova, Konstantin Tikhomirov, Nicole Tomczak-Jaegermann, Pierre Youssef

Abstract: Let $d$ be a fixed large integer. For any $n$ larger than $d$, let $A_n$ be the adjacency matrix of the random directed $d$-regular graph on $n$ vertices, with the uniform distribution. We show that $A_n$ has rank at least $n-1$ with probability going to one as $n$ goes to infinity. The proof combines the method of simple switchings and a recent result of the authors on delocalization of eigenvect… ▽ More Let $d$ be a fixed large integer. For any $n$ larger than $d$, let $A_n$ be the adjacency matrix of the random directed $d$-regular graph on $n$ vertices, with the uniform distribution. We show that $A_n$ has rank at least $n-1$ with probability going to one as $n$ goes to infinity. The proof combines the method of simple switchings and a recent result of the authors on delocalization of eigenvectors of $A_n$. △ Less

Submitted 18 July, 2018; v1 submitted 17 January, 2018; originally announced January 2018.

MSC Class: 60B20; 15B52; 46B06; 05C80

Journal ref: Journal of Complexity Volume 48, October 2018, Pages 103-110

arXiv:1801.05576 [pdf, ps, other]

Circular law for sparse random regular digraphs

Authors: Alexander Litvak, Anna Lytova, Konstantin Tikhomirov, Nicole Tomczak-Jaegermann, Pierre Youssef

Abstract: Fix a constant $C\geq 1$ and let $d=d(n)$ satisfy $d\leq \ln^{C} n$ for every large integer $n$. Denote by $A_n$ the adjacency matrix of a uniform random directed $d$-regular graph on $n$ vertices. We show that, as long as $d\to\infty$ with $n$, the empirical spectral distribution of appropriately rescaled matrix $A_n$ converges weakly in probability to the circular law. This result, together with… ▽ More Fix a constant $C\geq 1$ and let $d=d(n)$ satisfy $d\leq \ln^{C} n$ for every large integer $n$. Denote by $A_n$ the adjacency matrix of a uniform random directed $d$-regular graph on $n$ vertices. We show that, as long as $d\to\infty$ with $n$, the empirical spectral distribution of appropriately rescaled matrix $A_n$ converges weakly in probability to the circular law. This result, together with an earlier work of Cook, completely settles the problem of weak convergence of the empirical distribution in directed $d$-regular setting with the degree tending to infinity. As a crucial element of our proof, we develop a technique of bounding intermediate singular values of $A_n$ based on studying random normals to rowspaces and on constructing a product structure to deal with the lack of independence between the matrix entries. △ Less

Submitted 21 January, 2018; v1 submitted 17 January, 2018; originally announced January 2018.

MSC Class: 60B20; 15B52; 46B06; 05C80

arXiv:1801.05575 [pdf, ps, other]

Structure of eigenvectors of random regular digraphs

Authors: Alexander Litvak, Anna Lytova, Konstantin Tikhomirov, Nicole Tomczak-Jaegermann, Pierre Youssef

Abstract: Let $d$ and $n$ be integers satisfying $C\leq d\leq \exp(c\sqrt{\ln n})$ for some universal constants $c, C>0$, and let $z\in \mathbb{C}$. Denote by $M$ the adjacency matrix of a random $d$-regular directed graph on $n$ vertices. In this paper, we study the structure of the kernel of submatrices of $M-z\,{\rm Id}$, formed by removing a subset of rows. We show that with large probability the kernel… ▽ More Let $d$ and $n$ be integers satisfying $C\leq d\leq \exp(c\sqrt{\ln n})$ for some universal constants $c, C>0$, and let $z\in \mathbb{C}$. Denote by $M$ the adjacency matrix of a random $d$-regular directed graph on $n$ vertices. In this paper, we study the structure of the kernel of submatrices of $M-z\,{\rm Id}$, formed by removing a subset of rows. We show that with large probability the kernel consists of two non-intersecting types of vectors, which we call very steep and gradual with many levels. As a corollary, we show, in particular, that every eigenvector of $M$, except for constant multiples of $(1,1,\dots,1)$, possesses a weak delocalization property: its level sets have cardinality less than $Cn\ln^2 d/\ln n$. For a large constant $d$ this provides a principally new structural information on eigenvectors, implying that the number of their level sets grows to infinity with $n$. As a key technical ingredient of our proofs we introduce a decomposition of $\mathbb{C}^n$ into vectors of different degrees of `structuredness', which is an alternative to the decomposition based on the least common denominator in the regime when the underlying random matrix is very sparse. △ Less

Submitted 25 October, 2018; v1 submitted 17 January, 2018; originally announced January 2018.

Comments: Accepted in Transactions of the American Mathematical Society

MSC Class: 60B20; 15B52; 46B06; 05C80

arXiv:1711.00807 [pdf, ps, other]

doi 10.1007/s00222-018-0817-x

The dimension-free structure of nonhomogeneous random matrices

Authors: Rafał Latała, Ramon van Handel, Pierre Youssef

Abstract: Let $X$ be a symmetric random matrix with independent but non-identically distributed centered Gaussian entries. We show that $$ \mathbf{E}\|X\|_{S_p} \asymp \mathbf{E}\Bigg[ \Bigg(\sum_i\Bigg(\sum_j X_{ij}^2\Bigg)^{p/2}\Bigg)^{1/p} \Bigg] $$ for any $2\le p\le\infty$, where $S_p$ denotes the $p$-Schatten class and the constants are universal. The right-hand side admits an explicit express… ▽ More Let $X$ be a symmetric random matrix with independent but non-identically distributed centered Gaussian entries. We show that $$ \mathbf{E}\|X\|_{S_p} \asymp \mathbf{E}\Bigg[ \Bigg(\sum_i\Bigg(\sum_j X_{ij}^2\Bigg)^{p/2}\Bigg)^{1/p} \Bigg] $$ for any $2\le p\le\infty$, where $S_p$ denotes the $p$-Schatten class and the constants are universal. The right-hand side admits an explicit expression in terms of the variances of the matrix entries. This settles, in the case $p=\infty$, a conjecture of the first author, and provides a complete characterization of the class of infinite matrices with independent Gaussian entries that define bounded operators on $\ell_2$. Along the way, we obtain optimal dimension-free bounds on the moments $(\mathbf{E}\|X\|_{S_p}^p)^{1/p}$ that are of independent interest. We develop further extensions to non-symmetric matrices and to nonasymptotic moment and norm estimates for matrices with non-Gaussian entries that arise, for example, in the study of random graphs and in applied mathematics. △ Less

Submitted 21 August, 2018; v1 submitted 2 November, 2017; originally announced November 2017.

Comments: 36 pages, 2 figures

MSC Class: 60B20; 46B09; 46L53; 15B52

Journal ref: Invent. Math. 214 (2018), 1031-1080

arXiv:1707.02635 [pdf, ps, other]

doi 10.1007/s00440-018-0852-y

The smallest singular value of a shifted $d$-regular random square matrix

Authors: Alexander Litvak, Anna Lytova, Konstantin Tikhomirov, Nicole Tomczak-Jaegermann, Pierre Youssef

Abstract: We derive a lower bound on the smallest singular value of a random $d$-regular matrix, that is, the adjacency matrix of a random $d$-regular directed graph. More precisely, let $C_1<d< c_1 n/\log^2 n$ and let $\mathcal{M}_{n,d}$ be the set of all $0/1$-valued square $n\times n$ matrices such that each row and each column of a matrix $M\in \mathcal{M}_{n,d}$ has exactly $d$ ones. Let $M$ be uniform… ▽ More We derive a lower bound on the smallest singular value of a random $d$-regular matrix, that is, the adjacency matrix of a random $d$-regular directed graph. More precisely, let $C_1<d< c_1 n/\log^2 n$ and let $\mathcal{M}_{n,d}$ be the set of all $0/1$-valued square $n\times n$ matrices such that each row and each column of a matrix $M\in \mathcal{M}_{n,d}$ has exactly $d$ ones. Let $M$ be uniformly distributed on $\mathcal{M}_{n,d}$. Then the smallest singular value $s_{n} (M)$ of $M$ is greater than $c_2 n^{-6}$ with probability at least $1-C_2\log^2 d/\sqrt{d}$, where $c_1$, $c_2$, $C_1$, and $C_2$ are absolute positive constants independent of any other parameters. △ Less

Submitted 18 July, 2018; v1 submitted 9 July, 2017; originally announced July 2017.

MSC Class: 60B20; 15B52; 46B06; 05C80

Journal ref: Probability Theory and Related Fields, 2018

arXiv:1610.01765 [pdf, ps, other]

The spectral gap of dense random regular graphs

Authors: Konstantin Tikhomirov, Pierre Youssef

Abstract: For any $α\in (0,1)$ and any $n^α\leq d\leq n/2$, we show that $λ(G)\leq C_α\sqrt{d}$ with probability at least $1-\frac{1}{n}$, where $G$ is the uniform random $d$-regular graph on $n$ vertices, $λ(G)$ denotes its second largest eigenvalue (in absolute value) and $C_α$ is a constant depending only on $α$. Combined with earlier results in this direction covering the case of sparse random graphs, t… ▽ More For any $α\in (0,1)$ and any $n^α\leq d\leq n/2$, we show that $λ(G)\leq C_α\sqrt{d}$ with probability at least $1-\frac{1}{n}$, where $G$ is the uniform random $d$-regular graph on $n$ vertices, $λ(G)$ denotes its second largest eigenvalue (in absolute value) and $C_α$ is a constant depending only on $α$. Combined with earlier results in this direction covering the case of sparse random graphs, this completely settles the problem of estimating the magnitude of $λ(G)$, up to a multiplicative constant, for all values of $n$ and $d$, confirming a conjecture of Vu. The result is obtained as a consequence of an estimate for the second largest singular value of adjacency matrices of random {\it directed} graphs with predefined degree sequences. As the main technical tool, we prove a concentration inequality for arbitrary linear forms on the space of matrices, where the probability measure is induced by the adjacency matrix of a random directed graph with prescribed degree sequences. The proof is a non-trivial application of the Freedman inequality for martingales, combined with boots-trap** and tensorization arguments. Our method bears considerable differences compared to the approach used by Broder, Frieze, Suen and Upfal (1999) who established the upper bound for $λ(G)$ for $d=o(\sqrt{n})$, and to the argument of Cook, Goldstein and Johnson (2015) who derived a concentration inequality for linear forms and estimated $λ(G)$ in the range $d= O(n^{2/3})$ using size-biased couplings. △ Less

Submitted 18 November, 2016; v1 submitted 6 October, 2016; originally announced October 2016.

Comments: Title changed, abstract shortened, references added, preliminaries merged, minor changes here and there

Journal ref: Annals of Probability, Volume 47, Number 1 (2019), 362-419

arXiv:1610.01751 [pdf, ps, other]

doi 10.1007/s10959-018-0844-y

On the norm of a random jointly exchangeable matrix

Authors: Konstantin Tikhomirov, Pierre Youssef

Abstract: In this note, we show that the norm of an $n\times n$ random jointly exchangeable matrix with zero diagonal can be estimated in terms of the norm of its $n/2\times n/2$ submatrix located in the top right corner. As a consequence, we prove a relation between the second largest singular values of a random matrix with constant row and column sums and its top right $n/2\times n/2$ submatrix. The resul… ▽ More In this note, we show that the norm of an $n\times n$ random jointly exchangeable matrix with zero diagonal can be estimated in terms of the norm of its $n/2\times n/2$ submatrix located in the top right corner. As a consequence, we prove a relation between the second largest singular values of a random matrix with constant row and column sums and its top right $n/2\times n/2$ submatrix. The result has an application to estimating the spectral gap of random undirected $d$-regular graphs in terms of the second singular value of {\it directed} random graphs with predefined degree sequences. △ Less

Submitted 18 November, 2016; v1 submitted 6 October, 2016; originally announced October 2016.

Comments: minor changes

Journal ref: Journal of Theoretical Probability, 2018

arXiv:1605.03861 [pdf, ps, other]

doi 10.1093/imrn/rnx206

Approximating matrices and convex bodies through Kadison-Singer

Authors: Omer Friedland, Pierre Youssef

Abstract: We show that any $n\times m$ matrix $A$ can be approximated in operator norm by a submatrix with a number of columns of order the stable rank of $A$. This improves on existing results by removing an extra logarithmic factor in the size of the extracted matrix. Our proof uses the recent solution of the Kadison-Singer problem. We also develop a sort of tensorization technique to deal with constraint… ▽ More We show that any $n\times m$ matrix $A$ can be approximated in operator norm by a submatrix with a number of columns of order the stable rank of $A$. This improves on existing results by removing an extra logarithmic factor in the size of the extracted matrix. Our proof uses the recent solution of the Kadison-Singer problem. We also develop a sort of tensorization technique to deal with constraint approximation problems. As an application, we provide a sparsification result with equal weights and an optimal approximate John's decomposition for non-symmetric convex bodies. This enables us to show that any convex body in $\mathbb{R}^n$ is arbitrary close to another one having $O(n)$ contact points and fills the gap left in the literature after the results of Rudelson and Srivastava by completely answering the problem. As a consequence, we also show that the method developed by Guédon, Gordon and Meyer to establish the isomorphic Dvoretzky theorem yields to the best known result once we inject our improvements. △ Less

Submitted 24 January, 2017; v1 submitted 12 May, 2016; originally announced May 2016.

Comments: Changed the organization of the paper

Journal ref: International Mathematics Research Notices, 2017

arXiv:1601.00948 [pdf, ps, other]

Restricted invertibility revisited

Authors: Assaf Naor, Pierre Youssef

Abstract: Suppose that $m,n\in \mathbb{N}$ and that $A:\mathbb{R}^m\to \mathbb{R}^n$ is a linear operator. It is shown here that if $k,r\in \mathbb{N}$ satisfy $k<r\le \mathrm{\bf rank(A)}$ then there exists a subset $σ\subseteq \{1,\ldots,m\}$ with $|σ|=k$ such that the restriction of $A$ to $\mathbb{R}^σ\subseteq \mathbb{R}^m$ is invertible, and moreover the operator norm of the inverse… ▽ More Suppose that $m,n\in \mathbb{N}$ and that $A:\mathbb{R}^m\to \mathbb{R}^n$ is a linear operator. It is shown here that if $k,r\in \mathbb{N}$ satisfy $k<r\le \mathrm{\bf rank(A)}$ then there exists a subset $σ\subseteq \{1,\ldots,m\}$ with $|σ|=k$ such that the restriction of $A$ to $\mathbb{R}^σ\subseteq \mathbb{R}^m$ is invertible, and moreover the operator norm of the inverse $A^{-1}:A(\mathbb{R}^σ)\to \mathbb{R}^m$ is at most a constant multiple of the quantity $\sqrt{mr/((r-k)\sum_{i=r}^m \mathsf{s}_i(A)^2)}$, where $\mathsf{s}_1(A)\geqslant\ldots\geqslant \mathsf{s}_m(A)$ are the singular values of $A$. This improves over a series of works, starting from the seminal Bourgain--Tzafriri Restricted Invertibility Principle, through the works of Vershynin, Spielman--Srivastava and Marcus--Spielman--Srivastava. In particular, this directly implies an improved restricted invertibility principle in terms of Schatten--von Neumann norms. △ Less

Submitted 25 November, 2016; v1 submitted 5 January, 2016; originally announced January 2016.

Comments: Referee comments addressed. To appear in the collection of papers "Journey through Discrete Mathematics. A Tribute to Jiri Matousek" edited by Martin Loebl, Jaroslav Nesetril and Robin Thomas, due to be published by Springer

arXiv:1511.00113 [pdf, ps, other]

Adjacency matrices of random digraphs: singularity and anti-concentration

Authors: Alexander E. Litvak, Anna Lytova, Konstantin Tikhomirov, Nicole Tomczak-Jaegermann, Pierre Youssef

Abstract: Let ${\mathcal D}_{n,d}$ be the set of all $d$-regular directed graphs on $n$ vertices. Let $G$ be a graph chosen uniformly at random from ${\mathcal D}_{n,d}$ and $M$ be its adjacency matrix. We show that $M$ is invertible with probability at least $1-C\ln^{3} d/\sqrt{d}$ for $C\leq d\leq cn/\ln^2 n$, where $c, C$ are positive absolute constants. To this end, we establish a few properties of $d$-… ▽ More Let ${\mathcal D}_{n,d}$ be the set of all $d$-regular directed graphs on $n$ vertices. Let $G$ be a graph chosen uniformly at random from ${\mathcal D}_{n,d}$ and $M$ be its adjacency matrix. We show that $M$ is invertible with probability at least $1-C\ln^{3} d/\sqrt{d}$ for $C\leq d\leq cn/\ln^2 n$, where $c, C$ are positive absolute constants. To this end, we establish a few properties of $d$-regular directed graphs. One of them, a Littlewood-Offord type anti-concentration property, is of independent interest. Let $J$ be a subset of vertices of $G$ with $|J|\approx n/d$. Let $δ_i$ be the indicator of the event that the vertex $i$ is connected to $J$ and define $δ= (δ_1, δ_2, ..., δ_n)\in \{0, 1\}^n$. Then for every $v\in\{0,1\}^n$ the probability that $δ=v$ is exponentially small. This property holds even if a part of the graph is "frozen". △ Less

Submitted 18 October, 2016; v1 submitted 31 October, 2015; originally announced November 2015.

Comments: Final version

MSC Class: 60C05; 60B20; 05C80; 15B52; 46B06

Journal ref: J. of Math. Analysis and Appl., 445 (2017), 1447--1491

arXiv:1504.05834 [pdf, ps, other]

doi 10.1142/S2010326316500064

Bernstein type inequality for a class of dependent random matrices

Authors: Marwa Banna, Florence Merlevède, Pierre Youssef

Abstract: In this paper we obtain a Bernstein type inequality for the sum of self-adjoint centered and geometrically absolutely regular random matrices with bounded largest eigenvalue. This inequality can be viewed as an extension to the matrix setting of the Bernstein-type inequality obtained by Merlevède et al. (2009) in the context of real-valued bounded random variables that are geometrically absolutely… ▽ More In this paper we obtain a Bernstein type inequality for the sum of self-adjoint centered and geometrically absolutely regular random matrices with bounded largest eigenvalue. This inequality can be viewed as an extension to the matrix setting of the Bernstein-type inequality obtained by Merlevède et al. (2009) in the context of real-valued bounded random variables that are geometrically absolutely regular. The proofs rely on decoupling the Laplace transform of a sum on a Cantor-like set of random matrices. △ Less

Submitted 22 April, 2015; originally announced April 2015.

Comments: 22 pages

Journal ref: Random Matrices: Theory and Applications, 2016

arXiv:1504.01778 [pdf, ps, other]

Minimax of an n-dimensional Brownian motion

Authors: Konstantin Tikhomirov, Pierre Youssef

Abstract: For some absolute constants $c$, $n_0$ and any $n\geq n_0$, we show that with probability close to one the convex hull of the $n$-dimensional Brownian motion ${\rm conv}\{BM_n(t):\, t\in[1,2^{cn}]\}$ does not contain the origin. The result can be interpreted as an estimate of the minimax of the Gaussian process $\{ \langle \bar{u},BM_n(t)\rangle,\, \bar{u}\in S^{n-1},\, t\in [1,2^{cn}]\}$. For some absolute constants $c$, $n_0$ and any $n\geq n_0$, we show that with probability close to one the convex hull of the $n$-dimensional Brownian motion ${\rm conv}\{BM_n(t):\, t\in[1,2^{cn}]\}$ does not contain the origin. The result can be interpreted as an estimate of the minimax of the Gaussian process $\{ \langle \bar{u},BM_n(t)\rangle,\, \bar{u}\in S^{n-1},\, t\in [1,2^{cn}]\}$. △ Less

Submitted 7 April, 2015; originally announced April 2015.

arXiv:1410.0458 [pdf, ps, other]

When does a discrete-time random walk in $\mathbb{R}^n$ absorb the origin into its convex hull?

Authors: Konstantin Tikhomirov, Pierre Youssef

Abstract: We connect this question to a problem of estimating the probability that the image of certain random matrices does not intersect with a subset of the unit sphere $\mathbb{S}^{n-1}$. In this way, the case of a discretized Brownian motion is related to Gordon's escape theorem dealing with standard Gaussian matrices. The approach allows us to prove that with high probability, the $π/2$-covering time… ▽ More We connect this question to a problem of estimating the probability that the image of certain random matrices does not intersect with a subset of the unit sphere $\mathbb{S}^{n-1}$. In this way, the case of a discretized Brownian motion is related to Gordon's escape theorem dealing with standard Gaussian matrices. The approach allows us to prove that with high probability, the $π/2$-covering time of certain random walks on $\mathbb{S}^{n-1}$ is of order $n$. For certain spherical simplices on $\mathbb{S}^{n-1}$, we extend the "escape" phenomenon to a broad class of random matrices; as an application, we show that $e^{Cn}$ steps are sufficient for the standard walk on $\mathbb{Z}^n$ to absorb the origin into its convex hull with a high probability. △ Less

Submitted 1 November, 2015; v1 submitted 2 October, 2014; originally announced October 2014.

Comments: Added the matching bound contained in the paper Minimax of an n-dimension Brownian motion which is withdrawn

Journal ref: Ann. Probab. Volume 45, Number 2 (2017), 965-1002

arXiv:1401.6434 [pdf, ps, other]

Extracting a basis with fixed block inside a matrix

Authors: Pierre Youssef

Abstract: Given $U$ an $n\times m$ matrix of rank $n$ and $V$ block of columns inside $U$, we consider the problem of extracting a block of columns of rank $n$ which minimize the Hilbert-Schmidt norm of the inverse while preserving the block $V$. This generalizes a previous result of Gluskin-Olevskii, and improves the estimates when given a "good" block $V$. Given $U$ an $n\times m$ matrix of rank $n$ and $V$ block of columns inside $U$, we consider the problem of extracting a block of columns of rank $n$ which minimize the Hilbert-Schmidt norm of the inverse while preserving the block $V$. This generalizes a previous result of Gluskin-Olevskii, and improves the estimates when given a "good" block $V$. △ Less

Submitted 13 November, 2015; v1 submitted 24 January, 2014; originally announced January 2014.

Journal ref: Linear Algebra Appl. 469 (2015), 28-38

arXiv:1301.6607 [pdf, ps, other]

Estimating the covariance of random matrices

Authors: Pierre Youssef

Abstract: We extend to the matrix setting a recent result of Srivastava-Vershynin about estimating the covariance matrix of a random vector. The result can be in- terpreted as a quantified version of the law of large numbers for positive semi-definite matrices which verify some regularity assumption. Beside giving examples, we dis- cuss the notion of log-concave matrices and give estimates on the smallest a… ▽ More We extend to the matrix setting a recent result of Srivastava-Vershynin about estimating the covariance matrix of a random vector. The result can be in- terpreted as a quantified version of the law of large numbers for positive semi-definite matrices which verify some regularity assumption. Beside giving examples, we dis- cuss the notion of log-concave matrices and give estimates on the smallest and largest eigenvalues of a sum of such matrices. △ Less

Submitted 5 December, 2013; v1 submitted 28 January, 2013; originally announced January 2013.

Comments: 29 pages

Journal ref: Electron. J. Probab. 18 (2013), no. 107, 26 pp

arXiv:1212.0976 [pdf, ps, other]

doi 10.1093/imrn/rnt172

A note on column subset selection

Authors: Pierre Youssef

Abstract: Given a matrix U, using a deterministic method, we extract a "large" submatrix of U'(whose columns are obtained by normalizing those of U) and estimate its smallest and largest singular value. We apply this result to the study of contact points of the unit ball with its maximal volume ellipsoid. We consider also the paving problem and give a deterministic algorithm to partition a matrix into almos… ▽ More Given a matrix U, using a deterministic method, we extract a "large" submatrix of U'(whose columns are obtained by normalizing those of U) and estimate its smallest and largest singular value. We apply this result to the study of contact points of the unit ball with its maximal volume ellipsoid. We consider also the paving problem and give a deterministic algorithm to partition a matrix into almost isometric blocks recovering previous results of Bourgain-Tzafriri and Tropp. Finally, we partially answer a question raised by Naor about finding an algorithm in the spirit of Batson-Spielman-Srivastava's work to extract a "large" square submatrix of "small" norm. △ Less

Submitted 25 October, 2013; v1 submitted 5 December, 2012; originally announced December 2012.

Comments: 12 pages International Mathematics Research Notices, 2013

Journal ref: Int. Math. Res. Not. IMRN 2014, no. 23, 6431-6447

arXiv:1206.0654 [pdf, ps, other]

doi 10.1112/S0025579313000144

Restricted Invertibility and the Banach-Mazur distance to the cube

Authors: Pierre Youssef

Abstract: We prove a normalized version of the restricted invertibility principle obtained by Spielman-Srivastava. Applying this result, we get a new proof of the proportional Dvoretzky-Rogers factorization theorem recovering the best current estimate. As a consequence, we also recover the best known estimate for the Banach-Mazur distance to the cube: the distance of every n-dimensional normed space from \e… ▽ More We prove a normalized version of the restricted invertibility principle obtained by Spielman-Srivastava. Applying this result, we get a new proof of the proportional Dvoretzky-Rogers factorization theorem recovering the best current estimate. As a consequence, we also recover the best known estimate for the Banach-Mazur distance to the cube: the distance of every n-dimensional normed space from \ell_{\infty}^n is at most (2n)^(5/6). Finally, using tools from the work of Batson-Spielman-Srivastava, we give a new proof for a theorem of Kashin-Tzafriri on the norm of restricted matrices. △ Less

Submitted 12 June, 2013; v1 submitted 4 June, 2012; originally announced June 2012.

Comments: to appear in Mathematika

Journal ref: Mathematika 60 (2014) 201-218

arXiv:0901.0512 [pdf]

Expected Performance of the ATLAS Experiment - Detector, Trigger and Physics

Authors: The ATLAS Collaboration, G. Aad, E. Abat, B. Abbott, J. Abdallah, A. A. Abdelalim, A. Abdesselam, O. Abdinov, B. Abi, M. Abolins, H. Abramowicz, B. S. Acharya, D. L. Adams, T. N. Addy, C. Adorisio, P. Adragna, T. Adye, J. A. Aguilar-Saavedra, M. Aharrouche, S. P. Ahlen, F. Ahles, A. Ahmad, H. Ahmed, G. Aielli, T. Akdogan , et al. (2587 additional authors not shown)

Abstract: A detailed study is presented of the expected performance of the ATLAS detector. The reconstruction of tracks, leptons, photons, missing energy and jets is investigated, together with the performance of b-tagging and the trigger. The physics potential for a variety of interesting physics processes, within the Standard Model and beyond, is examined. The study comprises a series of notes based on… ▽ More A detailed study is presented of the expected performance of the ATLAS detector. The reconstruction of tracks, leptons, photons, missing energy and jets is investigated, together with the performance of b-tagging and the trigger. The physics potential for a variety of interesting physics processes, within the Standard Model and beyond, is examined. The study comprises a series of notes based on simulations of the detector and physics processes, with particular emphasis given to the data expected from the first years of operation of the LHC at CERN. △ Less

Submitted 14 August, 2009; v1 submitted 28 December, 2008; originally announced January 2009.

Showing 1–40 of 40 results for author: Youssef, P