-
DECICE: Device-Edge-Cloud Intelligent Collaboration Framework
Authors:
Julian Kunkel,
Christian Boehme,
Jonathan Decker,
Fabrizio Magugliani,
Dirk Pleiter,
Bastian Koller,
Karthee Sivalingam,
Sabri Pllana,
Alexander Nikolov,
Mujdat Soyturk,
Christian Racca,
Andrea Bartolini,
Adrian Tate,
Berkay Yaman
Abstract:
DECICE is a Horizon Europe project that is develo** an AI-enabled open and portable management framework for automatic and adaptive optimization and deployment of applications in computing continuum encompassing from IoT sensors on the Edge to large-scale Cloud / HPC computing infrastructures. In this paper, we describe the DECICE framework and architecture. Furthermore, we highlight use-cases f…
▽ More
DECICE is a Horizon Europe project that is develo** an AI-enabled open and portable management framework for automatic and adaptive optimization and deployment of applications in computing continuum encompassing from IoT sensors on the Edge to large-scale Cloud / HPC computing infrastructures. In this paper, we describe the DECICE framework and architecture. Furthermore, we highlight use-cases for framework evaluation: intelligent traffic intersection, magnetic resonance imaging, and emergency response.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
On the Gap between Hereditary Discrepancy and the Determinant Lower Bound
Authors:
Lily Li,
Aleksandar Nikolov
Abstract:
The determinant lower bound of Lovasz, Spencer, and Vesztergombi [European Journal of Combinatorics, 1986] is a powerful general way to prove lower bounds on the hereditary discrepancy of a set system. In their paper, Lovasz, Spencer, and Vesztergombi asked if hereditary discrepancy can also be bounded from above by a function of the hereditary discrepancy. This was answered in the negative by Hof…
▽ More
The determinant lower bound of Lovasz, Spencer, and Vesztergombi [European Journal of Combinatorics, 1986] is a powerful general way to prove lower bounds on the hereditary discrepancy of a set system. In their paper, Lovasz, Spencer, and Vesztergombi asked if hereditary discrepancy can also be bounded from above by a function of the hereditary discrepancy. This was answered in the negative by Hoffman, and the largest known multiplicative gap between the two quantities for a set system of $m$ substes of a universe of size $n$ is on the order of $\max\{\log n, \sqrt{\log m}\}$. On the other hand, building on work of Matoušek [Proceedings of the AMS, 2013], recently Jiang and Reis [SOSA, 2022] showed that this gap is always bounded up to constants by $\sqrt{\log(m)\log(n)}$. This is tight when $m$ is polynomial in $n$, but leaves open what happens for large $m$. We show that the bound of Jiang and Reis is tight for nearly the entire range of $m$. Our proof relies on a technique of amplifying discrepancy via taking Kronecker products, and on discrepancy lower bounds for a set system derived from the discrete Haar basis.
△ Less
Submitted 16 January, 2024; v1 submitted 14 March, 2023;
originally announced March 2023.
-
General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean Estimation
Authors:
Aleksandar Nikolov,
Haohua Tang
Abstract:
We investigate unbiased high-dimensional mean estimators in differential privacy. We consider differentially private mechanisms whose expected output equals the mean of the input dataset, for every dataset drawn from a fixed bounded $d$-dimensional domain $K$. A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it…
▽ More
We investigate unbiased high-dimensional mean estimators in differential privacy. We consider differentially private mechanisms whose expected output equals the mean of the input dataset, for every dataset drawn from a fixed bounded $d$-dimensional domain $K$. A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it. In the first part of this paper, we study the optimal error achievable by a Gaussian noise mechanism for a given domain $K$ when the error is measured in the $\ell_p$ norm for some $p \ge 2$. We give algorithms that compute the optimal covariance for the Gaussian noise for a given $K$ under suitable assumptions, and prove a number of nice geometric properties of the optimal error. These results generalize the theory of factorization mechanisms from domains $K$ that are symmetric and finite (or, equivalently, symmetric polytopes) to arbitrary bounded domains.
In the second part of the paper we show that Gaussian noise mechanisms achieve nearly optimal error among all private unbiased mean estimation mechanisms in a very strong sense. In particular, for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error as the best Gaussian noise mechanism. We extend this result to local differential privacy, and to approximate differential privacy, but for the latter the error lower bound holds either for a dataset or for a neighboring dataset, and this relaxation is necessary.
△ Less
Submitted 20 December, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
Learning versus Refutation in Noninteractive Local Differential Privacy
Authors:
Alexander Edmonds,
Aleksandar Nikolov,
Toniann Pitassi
Abstract:
We study two basic statistical tasks in non-interactive local differential privacy (LDP): learning and refutation. Learning requires finding a concept that best fits an unknown target function (from labelled samples drawn from a distribution), whereas refutation requires distinguishing between data distributions that are well-correlated with some concept in the class, versus distributions where th…
▽ More
We study two basic statistical tasks in non-interactive local differential privacy (LDP): learning and refutation. Learning requires finding a concept that best fits an unknown target function (from labelled samples drawn from a distribution), whereas refutation requires distinguishing between data distributions that are well-correlated with some concept in the class, versus distributions where the labels are random. Our main result is a complete characterization of the sample complexity of agnostic PAC learning for non-interactive LDP protocols. We show that the optimal sample complexity for any concept class is captured by the approximate $γ_2$~norm of a natural matrix associated with the class. Combined with previous work [Edmonds, Nikolov and Ullman, 2019] this gives an equivalence between learning and refutation in the agnostic setting.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Private Query Release via the Johnson-Lindenstrauss Transform
Authors:
Aleksandar Nikolov
Abstract:
We introduce a new method for releasing answers to statistical queries with differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is to randomly project the query answers to a lower dimensional space so that the distance between any two vectors of feasible query answers is preserved up to an additive error. Then we answer the projected queries using a simple noise-adding mec…
▽ More
We introduce a new method for releasing answers to statistical queries with differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is to randomly project the query answers to a lower dimensional space so that the distance between any two vectors of feasible query answers is preserved up to an additive error. Then we answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension. Using this method, we give, for the first time, purely differentially private mechanisms with optimal worst case sample complexity under average error for answering a workload of $k$ queries over a universe of size $N$. As other applications, we give the first purely private efficient mechanisms with optimal sample complexity for computing the covariance of a bounded high-dimensional distribution, and for answering 2-way marginal queries. We also show that, up to the dependence on the error, a variant of our mechanism is nearly optimal for every given query workload.
△ Less
Submitted 15 August, 2022;
originally announced August 2022.
-
OntoMerger: An Ontology Integration Library for Deduplicating and Connecting Knowledge Graph Nodes
Authors:
David Geleta,
Andriy Nikolov,
Mark ODonoghue,
Benedek Rozemberczki,
Anna Gogleva,
Valentina Tamma,
Terry R. Payne
Abstract:
Duplication of nodes is a common problem encountered when building knowledge graphs (KGs) from heterogeneous datasets, where it is crucial to be able to merge nodes having the same meaning. OntoMerger is a Python ontology integration library whose functionality is to deduplicate KG nodes. Our approach takes a set of KG nodes, map**s and disconnected hierarchies and generates a set of merged node…
▽ More
Duplication of nodes is a common problem encountered when building knowledge graphs (KGs) from heterogeneous datasets, where it is crucial to be able to merge nodes having the same meaning. OntoMerger is a Python ontology integration library whose functionality is to deduplicate KG nodes. Our approach takes a set of KG nodes, map**s and disconnected hierarchies and generates a set of merged nodes together with a connected hierarchy. In addition, the library provides analytic and data testing functionalities that can be used to fine-tune the inputs, further reducing duplication, and to increase connectivity of the output graph. OntoMerger can be applied to a wide variety of ontologies and KGs. In this paper we introduce OntoMerger and illustrate its functionality on a real-world biomedical KG.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
ChemicalX: A Deep Learning Library for Drug Pair Scoring
Authors:
Benedek Rozemberczki,
Charles Tapley Hoyt,
Anna Gogleva,
Piotr Grabowski,
Klas Karis,
Andrej Lamov,
Andriy Nikolov,
Sebastian Nilsson,
Michael Ughetto,
Yu Wang,
Tyler Derr,
Benjamin M Gyori
Abstract:
In this paper, we introduce ChemicalX, a PyTorch-based deep learning library designed for providing a range of state of the art models to solve the drug pair scoring task. The primary objective of the library is to make deep drug pair scoring models accessible to machine learning researchers and practitioners in a streamlined framework.The design of ChemicalX reuses existing high level model train…
▽ More
In this paper, we introduce ChemicalX, a PyTorch-based deep learning library designed for providing a range of state of the art models to solve the drug pair scoring task. The primary objective of the library is to make deep drug pair scoring models accessible to machine learning researchers and practitioners in a streamlined framework.The design of ChemicalX reuses existing high level model training utilities, geometric deep learning, and deep chemistry layers from the PyTorch ecosystem. Our system provides neural network layers, custom pair scoring architectures, data loaders, and batch iterators for end users. We showcase these features with example code snippets and case studies to highlight the characteristics of ChemicalX. A range of experiments on real world drug-drug interaction, polypharmacy side effect, and combination synergy prediction tasks demonstrate that the models available in ChemicalX are effective at solving the pair scoring task. Finally, we show that ChemicalX could be used to train and score machine learning models on large drug pair datasets with hundreds of thousands of compounds on commodity hardware.
△ Less
Submitted 26 May, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
A Unified View of Relational Deep Learning for Drug Pair Scoring
Authors:
Benedek Rozemberczki,
Stephen Bonner,
Andriy Nikolov,
Michael Ughetto,
Sebastian Nilsson,
Eliseo Papa
Abstract:
In recent years, numerous machine learning models which attempt to solve polypharmacy side effect identification, drug-drug interaction prediction and combination therapy design tasks have been proposed. Here, we present a unified theoretical view of relational machine learning models which can address these tasks. We provide fundamental definitions, compare existing model architectures and discus…
▽ More
In recent years, numerous machine learning models which attempt to solve polypharmacy side effect identification, drug-drug interaction prediction and combination therapy design tasks have been proposed. Here, we present a unified theoretical view of relational machine learning models which can address these tasks. We provide fundamental definitions, compare existing model architectures and discuss performance metrics, datasets and evaluation protocols. In addition, we emphasize possible high impact applications and important future research directions in this domain.
△ Less
Submitted 11 December, 2021; v1 submitted 4 November, 2021;
originally announced November 2021.
-
MOOMIN: Deep Molecular Omics Network for Anti-Cancer Drug Combination Therapy
Authors:
Benedek Rozemberczki,
Anna Gogleva,
Sebastian Nilsson,
Gavin Edwards,
Andriy Nikolov,
Eliseo Papa
Abstract:
We propose the molecular omics network (MOOMIN) a multimodal graph neural network used by AstraZeneca oncologists to predict the synergy of drug combinations for cancer treatment. Our model learns drug representations at multiple scales based on a drug-protein interaction network and metadata. Structural properties of compounds and proteins are encoded to create vertex features for a message-passi…
▽ More
We propose the molecular omics network (MOOMIN) a multimodal graph neural network used by AstraZeneca oncologists to predict the synergy of drug combinations for cancer treatment. Our model learns drug representations at multiple scales based on a drug-protein interaction network and metadata. Structural properties of compounds and proteins are encoded to create vertex features for a message-passing scheme that operates on the bipartite interaction graph. Propagated messages form multi-resolution drug representations which we utilized to create drug pair descriptors. By conditioning the drug combination representations on the cancer cell type we define a synergy scoring function that can inductively score unseen pairs of drugs. Experimental results on the synergy scoring task demonstrate that MOOMIN outperforms state-of-the-art graph fingerprinting, proximity preserving node embedding, and existing deep learning approaches. Further results establish that the predictive performance of our model is robust to hyperparameter changes. We demonstrate that the model makes high-quality predictions over a wide range of cancer cell line tissues, out-of-sample predictions can be validated with external synergy databases, and that the proposed model is data efficient at learning.
△ Less
Submitted 8 August, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Overview of the CLEF--2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News
Authors:
Preslav Nakov,
Giovanni Da San Martino,
Tamer Elsayed,
Alberto Barrón-Cedeño,
Rubén Míguez,
Shaden Shaar,
Firoj Alam,
Fatima Haouari,
Maram Hasanain,
Watheq Mansour,
Bayan Hamdan,
Zien Sheikh Ali,
Nikolay Babulkov,
Alex Nikolov,
Gautam Kishore Shahi,
Julia Maria Struß,
Thomas Mandl,
Mucahid Kutlu,
Yavuz Selim Kartal
Abstract:
We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 a…
▽ More
We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 asks to determine whether a claim in a tweet can be verified using a set of previously fact-checked claims (in Arabic and English). Task 3 asks to predict the veracity of a news article and its topical domain (in English). The evaluation is based on mean average precision or precision at rank k for the ranking tasks, and macro-F1 for the classification tasks. This was the most popular CLEF-2021 lab in terms of team registrations: 132 teams. Nearly one-third of them participated: 15, 5, and 25 teams submitted official runs for tasks 1, 2, and 3, respectively.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Findings of the NLP4IF-2021 Shared Tasks on Fighting the COVID-19 Infodemic and Censorship Detection
Authors:
Shaden Shaar,
Firoj Alam,
Giovanni Da San Martino,
Alex Nikolov,
Wajdi Zaghouani,
Preslav Nakov,
Anna Feldman
Abstract:
We present the results and the main findings of the NLP4IF-2021 shared tasks. Task 1 focused on fighting the COVID-19 infodemic in social media, and it was offered in Arabic, Bulgarian, and English. Given a tweet, it asked to predict whether that tweet contains a verifiable claim, and if so, whether it is likely to be false, is of general interest, is likely to be harmful, and is worthy of manual…
▽ More
We present the results and the main findings of the NLP4IF-2021 shared tasks. Task 1 focused on fighting the COVID-19 infodemic in social media, and it was offered in Arabic, Bulgarian, and English. Given a tweet, it asked to predict whether that tweet contains a verifiable claim, and if so, whether it is likely to be false, is of general interest, is likely to be harmful, and is worthy of manual fact-checking; also, whether it is harmful to society, and whether it requires the attention of policy makers. Task~2 focused on censorship detection, and was offered in Chinese. A total of ten teams submitted systems for task 1, and one team participated in task 2; nine teams also submitted a system description paper. Here, we present the tasks, analyze the results, and discuss the system submissions and the methods they used. Most submissions achieved sizable improvements over several baselines, and the best systems used pre-trained Transformers and ensembles. The data, the scorers and the leaderboards for the tasks are available at http://gitlab.com/NLP4IF/nlp4if-2021.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Near Neighbor Search via Efficient Average Distortion Embeddings
Authors:
Deepanshu Kush,
Aleksandar Nikolov,
Haohua Tang
Abstract:
A recent series of papers by Andoni, Naor, Nikolov, Razenshteyn, and Waingarten (STOC 2018, FOCS 2018) has given approximate near neighbour search (NNS) data structures for a wide class of distance metrics, including all norms. In particular, these data structures achieve approximation on the order of $p$ for $\ell_p^d$ norms with space complexity nearly linear in the dataset size $n$ and polynomi…
▽ More
A recent series of papers by Andoni, Naor, Nikolov, Razenshteyn, and Waingarten (STOC 2018, FOCS 2018) has given approximate near neighbour search (NNS) data structures for a wide class of distance metrics, including all norms. In particular, these data structures achieve approximation on the order of $p$ for $\ell_p^d$ norms with space complexity nearly linear in the dataset size $n$ and polynomial in the dimension $d$, and query time sub-linear in $n$ and polynomial in $d$. The main shortcoming is the exponential in $d$ pre-processing time required for their construction.
In this paper, we describe a more direct framework for constructing NNS data structures for general norms. More specifically, we show via an algorithmic reduction that an efficient NNS data structure for a given metric is implied by an efficient average distortion embedding of it into $\ell_1$ or into Euclidean space. In particular, the resulting data structures require only polynomial pre-processing time, as long as the embedding can be computed in polynomial time. As a concrete instantiation of this framework, we give an NNS data structure for $\ell_p$ with efficient pre-processing that matches the approximation factor, space and query complexity of the aforementioned data structure of Andoni et al. On the way, we resolve a question of Naor (Analysis and Geometry in Metric Spaces, 2014) and provide an explicit, efficiently computable embedding of $\ell_p$, for $p \ge 2$, into $\ell_2$ with (quadratic) average distortion on the order of $p$. We expect our approach to pave the way for constructing efficient NNS data structures for all norms.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With Transformer Models
Authors:
Alex Nikolov,
Giovanni Da San Martino,
Ivan Koychev,
Preslav Nakov
Abstract:
While misinformation and disinformation have been thriving in social media for years, with the emergence of the COVID-19 pandemic, the political and the health misinformation merged, thus elevating the problem to a whole new level and giving rise to the first global infodemic. The fight against this infodemic has many aspects, with fact-checking and debunking false and misleading claims being amon…
▽ More
While misinformation and disinformation have been thriving in social media for years, with the emergence of the COVID-19 pandemic, the political and the health misinformation merged, thus elevating the problem to a whole new level and giving rise to the first global infodemic. The fight against this infodemic has many aspects, with fact-checking and debunking false and misleading claims being among the most important ones. Unfortunately, manual fact-checking is time-consuming and automatic fact-checking is resource-intense, which means that we need to pre-filter the input social media posts and to throw out those that do not appear to be check-worthy. With this in mind, here we propose a model for detecting check-worthy tweets about COVID-19, which combines deep contextualized text representations with modeling the social context of the tweet. We further describe a number of additional experiments and comparisons, which we believe should be useful for future research as they provide some indication about what techniques are effective for the task. Our official submission to the English version of CLEF-2020 CheckThat! Task 1, system Team_Alex, was ranked second with a MAP score of 0.8034, which is almost tied with the wining system, lagging behind by just 0.003 MAP points absolute.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
On the Computational Complexity of Linear Discrepancy
Authors:
Lily Li,
Aleksandar Nikolov
Abstract:
Many problems in computer science and applied mathematics require rounding a vector $\mathbf{w}$ of fractional values lying in the interval $[0,1]$ to a binary vector $\mathbf{x}$ so that, for a given matrix $\mathbf{A}$, $\mathbf{A}\mathbf{x}$ is as close to $\mathbf{A}\mathbf{w}$ as possible. For example, this problem arises in LP rounding algorithms used to approximate $\mathsf{NP}$-hard optimi…
▽ More
Many problems in computer science and applied mathematics require rounding a vector $\mathbf{w}$ of fractional values lying in the interval $[0,1]$ to a binary vector $\mathbf{x}$ so that, for a given matrix $\mathbf{A}$, $\mathbf{A}\mathbf{x}$ is as close to $\mathbf{A}\mathbf{w}$ as possible. For example, this problem arises in LP rounding algorithms used to approximate $\mathsf{NP}$-hard optimization problems and in the design of uniformly distributed point sets for numerical integration. For a given matrix $\mathbf{A}$, the worst-case error over all choices of $\mathbf{w}$ incurred by the best possible rounding is measured by the linear discrepancy of $\mathbf{A}$, a quantity studied in discrepancy theory, and introduced by Lovasz, Spencer, and Vesztergombi (EJC, 1986).
We initiate the study of the computational complexity of linear discrepancy. Our investigation proceeds in two directions: (1) proving hardness results and (2) finding both exact and approximate algorithms to evaluate the linear discrepancy of certain matrices. For (1), we show that linear discrepancy is $\mathsf{NP}$-hard. Thus we do not expect to find an efficient exact algorithm for the general case. Restricting our attention to matrices with a constant number of rows, we present a poly-time exact algorithm for matrices consisting of a single row and matrices with a constant number of rows and entries of bounded magnitude. We also present an exponential-time approximation algorithm for general matrices, and an algorithm that approximates linear discrepancy to within an exponential factor.
△ Less
Submitted 31 July, 2020;
originally announced August 2020.
-
Synergistic effect in two-phase laser procedure for production of silver nanoparticles colloids applicable in ophthalmology
Authors:
A. S. Nikolov,
N. E. Stankova,
D. B. Karashanova,
N. N. Nedyalkov,
E. L. Pavlov,
K. Tz. Koev,
Hr. Najdenski,
V. Kussovski,
L. A. Avramov,
C. Ristoscu,
M. Badiceanu,
I. N. Mihailescu
Abstract:
This work reports on the production of Ag nanoparticles (AgNPs) in water solution based upon two-phase pulsed laser procedure for ophthalmological therapeutic approaches. In this case, the AgNPs should be less then 10 nm and have a narrow size distribution. Nanoparticles of this sized-scale are capable to penetrate the complex ocular barriers, ensuring effective non-invasive drug delivery to retin…
▽ More
This work reports on the production of Ag nanoparticles (AgNPs) in water solution based upon two-phase pulsed laser procedure for ophthalmological therapeutic approaches. In this case, the AgNPs should be less then 10 nm and have a narrow size distribution. Nanoparticles of this sized-scale are capable to penetrate the complex ocular barriers, ensuring effective non-invasive drug delivery to retina. In the first phase, AgNPs larger than 20 nm were fabricated via laser ablation of a Ag target under water by irradiation with a fundamental wavelength of 1064 nm generated by a Nd:YAG laser. During the second phase, to reduce the mean size of the as-obtained nanoparticles and properly adjust the size distribution, the water colloids were additionally irradiated by ultraviolet harmonics (355 nm and 266 nm) from the same laser source. The effect of the key laser parameters - wavelength, fluence and laser exposure time - upon the nanoparticles morphology was studied. The most suitable post-ablation treatment of initial colloids was obtained by consecutive irradiation with the third (355 nm) and the fourth (266 nm) harmonics of the fundamental laser wavelength. By using this approach synergistic effect between two mechanisms of light absorption by AgNPs was induced. As a result contaminant-free colloids of AgNPs with a size inferior to 10 nm and a quite narrow size distribution with a standard deviation of 1.6 nm were fabricated. The toxic effect of the as-produced AgNPs on Gram-positive and Gram-negative bacteria and Candida albicans was explored. The most efficient action was reached against Pseudomonas aeruginosa and Escherichia coli. Potential application of the synthesized AgNPs colloidal aqueous solutions with antimicrobial action as a non-invasive method for ocular infections prevention and treatment was proposed.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media
Authors:
Alberto Barron-Cedeno,
Tamer Elsayed,
Preslav Nakov,
Giovanni Da San Martino,
Maram Hasanain,
Reem Suwaileh,
Fatima Haouari,
Nikolay Babulkov,
Bayan Hamdan,
Alex Nikolov,
Shaden Shaar,
Zien Sheikh Ali
Abstract:
We present an overview of the third edition of the CheckThat! Lab at CLEF 2020. The lab featured five tasks in two different languages: English and Arabic. The first four tasks compose the full pipeline of claim verification in social media: Task 1 on check-worthiness estimation, Task 2 on retrieving previously fact-checked claims, Task 3 on evidence retrieval, and Task 4 on claim verification. Th…
▽ More
We present an overview of the third edition of the CheckThat! Lab at CLEF 2020. The lab featured five tasks in two different languages: English and Arabic. The first four tasks compose the full pipeline of claim verification in social media: Task 1 on check-worthiness estimation, Task 2 on retrieving previously fact-checked claims, Task 3 on evidence retrieval, and Task 4 on claim verification. The lab is completed with Task 5 on check-worthiness estimation in political debates and speeches. A total of 67 teams registered to participate in the lab (up from 47 at CLEF 2019), and 23 of them actually submitted runs (compared to 14 at CLEF 2019). Most teams used deep neural networks based on BERT, LSTMs, or CNNs, and achieved sizable improvements over the baselines on all tasks. Here we describe the tasks setup, the evaluation results, and a summary of the approaches used by the participants, and we discuss some lessons learned. Last but not least, we release to the research community all datasets from the lab as well as the evaluation scripts, which should enable further research in the important tasks of check-worthiness estimation and automatic claim verification.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Fighting the COVID-19 Infodemic in Social Media: A Holistic Perspective and a Call to Arms
Authors:
Firoj Alam,
Fahim Dalvi,
Shaden Shaar,
Nadir Durrani,
Hamdy Mubarak,
Alex Nikolov,
Giovanni Da San Martino,
Ahmed Abdelali,
Hassan Sajjad,
Kareem Darwish,
Preslav Nakov
Abstract:
With the outbreak of the COVID-19 pandemic, people turned to social media to read and to share timely information including statistics, warnings, advice, and inspirational stories. Unfortunately, alongside all this useful information, there was also a new blending of medical and political misinformation and disinformation, which gave rise to the first global infodemic. While fighting this infodemi…
▽ More
With the outbreak of the COVID-19 pandemic, people turned to social media to read and to share timely information including statistics, warnings, advice, and inspirational stories. Unfortunately, alongside all this useful information, there was also a new blending of medical and political misinformation and disinformation, which gave rise to the first global infodemic. While fighting this infodemic is typically thought of in terms of factuality, the problem is much broader as malicious content includes not only fake news, rumors, and conspiracy theories, but also promotion of fake cures, panic, racism, xenophobia, and mistrust in the authorities, among others. This is a complex problem that needs a holistic approach combining the perspectives of journalists, fact-checkers, policymakers, government entities, social media platforms, and society as a whole. Taking them into account we define an annotation schema and detailed annotation instructions, which reflect these perspectives. We performed initial annotations using this schema, and our initial experiments demonstrated sizable improvements over the baselines. Now, we issue a call to arms to the research community and beyond to join the fight by supporting our crowdsourcing annotation efforts.
△ Less
Submitted 9 April, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the Society
Authors:
Firoj Alam,
Shaden Shaar,
Fahim Dalvi,
Hassan Sajjad,
Alex Nikolov,
Hamdy Mubarak,
Giovanni Da San Martino,
Ahmed Abdelali,
Nadir Durrani,
Kareem Darwish,
Abdulaziz Al-Homaid,
Wajdi Zaghouani,
Tommaso Caselli,
Gijs Danoe,
Friso Stolk,
Britt Bruntink,
Preslav Nakov
Abstract:
With the emergence of the COVID-19 pandemic, the political and the medical aspects of disinformation merged as the problem got elevated to a whole new level to become the first global infodemic. Fighting this infodemic has been declared one of the most important focus areas of the World Health Organization, with dangers ranging from promoting fake cures, rumors, and conspiracy theories to spreadin…
▽ More
With the emergence of the COVID-19 pandemic, the political and the medical aspects of disinformation merged as the problem got elevated to a whole new level to become the first global infodemic. Fighting this infodemic has been declared one of the most important focus areas of the World Health Organization, with dangers ranging from promoting fake cures, rumors, and conspiracy theories to spreading xenophobia and panic. Addressing the issue requires solving a number of challenging problems such as identifying messages containing claims, determining their check-worthiness and factuality, and their potential to do harm as well as the nature of that harm, to mention just a few. To address this gap, we release a large dataset of 16K manually annotated tweets for fine-grained disinformation analysis that (i) focuses on COVID-19, (ii) combines the perspectives and the interests of journalists, fact-checkers, social media platforms, policy makers, and society, and (iii) covers Arabic, Bulgarian, Dutch, and English. Finally, we show strong evaluation results using pretrained Transformers, thus confirming the practical utility of the dataset in monolingual vs. multilingual, and single task vs. multitask settings.
△ Less
Submitted 22 September, 2021; v1 submitted 30 April, 2020;
originally announced May 2020.
-
Private Query Release Assisted by Public Data
Authors:
Raef Bassily,
Albert Cheu,
Shay Moran,
Aleksandar Nikolov,
Jonathan Ullman,
Zhiwei Steven Wu
Abstract:
We study the problem of differentially private query release assisted by access to public data. In this problem, the goal is to answer a large class $\mathcal{H}$ of statistical queries with error no more than $α$ using a combination of public and private samples. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in…
▽ More
We study the problem of differentially private query release assisted by access to public data. In this problem, the goal is to answer a large class $\mathcal{H}$ of statistical queries with error no more than $α$ using a combination of public and private samples. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in terms of the private and public sample complexities.
First, we show that we can solve the problem for any query class $\mathcal{H}$ of finite VC-dimension using only $d/α$ public samples and $\sqrt{p}d^{3/2}/α^2$ private samples, where $d$ and $p$ are the VC-dimension and dual VC-dimension of $\mathcal{H}$, respectively. In comparison, with only private samples, this problem cannot be solved even for simple query classes with VC-dimension one, and without any private samples, a larger public sample of size $d/α^2$ is needed. Next, we give sample complexity lower bounds that exhibit tight dependence on $p$ and $α$. For the class of decision stumps, we give a lower bound of $\sqrt{p}/α$ on the private sample complexity whenever the public sample size is less than $1/α^2$. Given our upper bounds, this shows that the dependence on $\sqrt{p}$ is necessary in the private sample complexity. We also give a lower bound of $1/α$ on the public sample complexity for a broad family of query classes, which by our upper bound, is tight in $α$.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Maximizing Determinants under Matroid Constraints
Authors:
Vivek Madan,
Aleksandar Nikolov,
Mohit Singh,
Uthaipon Tantipongpipat
Abstract:
Given vectors $v_1,\dots,v_n\in\mathbb{R}^d$ and a matroid $M=([n],I)$, we study the problem of finding a basis $S$ of $M$ such that $\det(\sum_{i \in S}v_i v_i^\top)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning. The current best results include an $e^{2k}$-estimation for any matroid of ran…
▽ More
Given vectors $v_1,\dots,v_n\in\mathbb{R}^d$ and a matroid $M=([n],I)$, we study the problem of finding a basis $S$ of $M$ such that $\det(\sum_{i \in S}v_i v_i^\top)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning. The current best results include an $e^{2k}$-estimation for any matroid of rank $k$ and a $(1+ε)^d$-approximation for a uniform matroid of rank $k\ge d+\frac dε$, where the rank $k\ge d$ denotes the desired size of the optimal set. Our main result is a new approximation algorithm with an approximation guarantee that depends only on the dimension $d$ of the vectors and not on the size $k$ of the output set. In particular, we show an $(O(d))^{d}$-estimation and an $(O(d))^{d^3}$-approximation for any matroid, giving a significant improvement over prior work when $k\gg d$.
Our result relies on the existence of an optimal solution to a convex programming relaxation for the problem which has sparse support; in particular, no more than $O(d^2)$ variables of the solution have fractional values. The sparsity results rely on the interplay between the first-order optimality conditions for the convex program and matroid theory. We believe that the techniques introduced to show sparsity of optimal solutions to convex programs will be of independent interest. We also give a randomized algorithm that rounds a sparse fractional solution to a feasible integral solution to the original problem. To show the approximation guarantee, we utilize recent works on strongly log-concave polynomials and show new relationships between different convex programs studied for the problem. Finally, we use the estimation algorithm and sparsity results to give an efficient deterministic approximation algorithm with an approximation guarantee that depends solely on the dimension $d$.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Locally Private Hypothesis Selection
Authors:
Sivakanth Gopi,
Gautam Kamath,
Janardhan Kulkarni,
Aleksandar Nikolov,
Zhiwei Steven Wu,
Huanyu Zhang
Abstract:
We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-local differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution. T…
▽ More
We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-local differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution. This is a generalization of the classic problem of $k$-wise simple hypothesis testing, which corresponds to when $p \in \mathcal{Q}$, and we wish to identify $p$. Absent privacy constraints, this problem requires $O(\log k)$ samples from $p$, and it was recently shown that the same complexity is achievable under (central) differential privacy. However, the naive approach to this problem under local differential privacy would require $\tilde O(k^2)$ samples.
We first show that the constraint of local differential privacy incurs an exponential increase in cost: any algorithm for this problem requires at least $Ω(k)$ samples. Second, for the special case of $k$-wise simple hypothesis testing, we provide a non-interactive algorithm which nearly matches this bound, requiring $\tilde O(k)$ samples. Finally, we provide sequentially interactive algorithms for the general case, requiring $\tilde O(k)$ samples and only $O(\log \log k)$ rounds of interactivity. Our algorithms are achieved through a reduction to maximum selection with adversarial comparators, a problem of independent interest for which we initiate study in the parallel setting. For this problem, we provide a family of algorithms for each number of allowed rounds of interaction $t$, as well as lower bounds showing that they are near-optimal for every $t$. Notably, our algorithms result in exponential improvements on the round complexity of previous methods.
△ Less
Submitted 19 June, 2020; v1 submitted 21 February, 2020;
originally announced February 2020.
-
The Power of Factorization Mechanisms in Local and Central Differential Privacy
Authors:
Alexander Edmonds,
Aleksandar Nikolov,
Jonathan Ullman
Abstract:
We give new characterizations of the sample complexity of answering linear queries (statistical queries) in the local and central models of differential privacy:
*In the non-interactive local model, we give the first approximate characterization of the sample complexity. Informally our bounds are tight to within polylogarithmic factors in the number of queries and desired accuracy. Our character…
▽ More
We give new characterizations of the sample complexity of answering linear queries (statistical queries) in the local and central models of differential privacy:
*In the non-interactive local model, we give the first approximate characterization of the sample complexity. Informally our bounds are tight to within polylogarithmic factors in the number of queries and desired accuracy. Our characterization extends to agnostic learning in the local model.
*In the central model, we give a characterization of the sample complexity in the high-accuracy regime that is analogous to that of Nikolov, Talwar, and Zhang (STOC 2013), but is both quantitatively tighter and has a dramatically simpler proof.
Our lower bounds apply equally to the empirical and population estimation problems. In both cases, our characterizations show that a particular factorization mechanism is approximately optimal, and the optimal sample complexity is bounded from above and below by well studied factorization norms of a matrix associated with the queries.
△ Less
Submitted 19 November, 2019;
originally announced November 2019.
-
Preconditioning for the Geometric Transportation Problem
Authors:
Andrey Boris Khesin,
Aleksandar Nikolov,
Dmitry Paramonov
Abstract:
In the geometric transportation problem, we are given a collection of points $P$ in $d$-dimensional Euclidean space, and each point is given a supply of $μ(p)$ units of mass, where $μ(p)$ could be a positive or a negative integer, and the total sum of the supplies is $0$. The goal is to find a flow (called a transportation map) that transports $μ(p)$ units from any point $p$ with $μ(p) > 0$, and t…
▽ More
In the geometric transportation problem, we are given a collection of points $P$ in $d$-dimensional Euclidean space, and each point is given a supply of $μ(p)$ units of mass, where $μ(p)$ could be a positive or a negative integer, and the total sum of the supplies is $0$. The goal is to find a flow (called a transportation map) that transports $μ(p)$ units from any point $p$ with $μ(p) > 0$, and transports $-μ(p)$ units into any point $p$ with $μ(p) < 0$. Moreover, the flow should minimize the total distance traveled by the transported mass. The optimal value is known as the transportation cost, or the Earth Mover's Distance (from the points with positive supply to those with negative supply). This problem has been widely studied in many fields of computer science: from theoretical work in computational geometry, to applications in computer vision, graphics, and machine learning.
In this work we study approximation algorithms for the geometric transportation problem. We give an algorithm which, for any fixed dimension $d$, finds a $(1+\varepsilon)$-approximate transportation map in time nearly-linear in $n$, and polynomial in $\varepsilon^{-1}$ and in the logarithm of the total supply. This is the first approximation scheme for the problem whose running time depends on $n$ as $n\cdot \mathrm{polylog}(n)$. Our techniques combine the generalized preconditioning framework of Sherman, which is grounded in continuous optimization, with simple geometric arguments to first reduce the problem to a minimum cost flow problem on a sparse graph, and then to design a good preconditioner for this latter problem.
△ Less
Submitted 22 February, 2019;
originally announced February 2019.
-
On Mean Estimation for General Norms with Statistical Queries
Authors:
Jerry Li,
Aleksandar Nikolov,
Ilya Razenshteyn,
Erik Waingarten
Abstract:
We study the problem of mean estimation for high-dimensional distributions, assuming access to a statistical query oracle for the distribution. For a normed space $X = (\mathbb{R}^d, \|\cdot\|_X)$ and a distribution supported on vectors $x \in \mathbb{R}^d$ with $\|x\|_{X} \leq 1$, the task is to output an estimate $\hatμ \in \mathbb{R}^d$ which is $ε$-close in the distance induced by…
▽ More
We study the problem of mean estimation for high-dimensional distributions, assuming access to a statistical query oracle for the distribution. For a normed space $X = (\mathbb{R}^d, \|\cdot\|_X)$ and a distribution supported on vectors $x \in \mathbb{R}^d$ with $\|x\|_{X} \leq 1$, the task is to output an estimate $\hatμ \in \mathbb{R}^d$ which is $ε$-close in the distance induced by $\|\cdot\|_X$ to the true mean of the distribution. We obtain sharp upper and lower bounds for the statistical query complexity of this problem when the the underlying norm is symmetric as well as for Schatten-$p$ norms, answering two questions raised by Feldman, Guzmán, and Vempala (SODA 2017).
△ Less
Submitted 6 February, 2019;
originally announced February 2019.
-
Sticky Brownian Rounding and its Applications to Constraint Satisfaction Problems
Authors:
Sepehr Abbasi-Zadeh,
Nikhil Bansal,
Guru Guruganesh,
Aleksandar Nikolov,
Roy Schwartz,
Mohit Singh
Abstract:
Semidefinite programming is a powerful tool in the design and analysis of approximation algorithms for combinatorial optimization problems. In particular, the random hyperplane rounding method of Goemans and Williamson has been extensively studied for more than two decades, resulting in various extensions to the original technique and beautiful algorithms for a wide range of applications. Despite…
▽ More
Semidefinite programming is a powerful tool in the design and analysis of approximation algorithms for combinatorial optimization problems. In particular, the random hyperplane rounding method of Goemans and Williamson has been extensively studied for more than two decades, resulting in various extensions to the original technique and beautiful algorithms for a wide range of applications. Despite the fact that this approach yields tight approximation guarantees for some problems, e.g., Max-Cut, for many others, e.g., Max-SAT and Max-DiCut, the tight approximation ratio is still unknown. One of the main reasons for this is the fact that very few techniques for rounding semidefinite relaxations are known.
In this work, we present a new general and simple method for rounding semi-definite programs, based on Brownian motion. Our approach is inspired by recent results in algorithmic discrepancy theory. We develop and present tools for analyzing our new rounding algorithms, utilizing mathematical machinery from the theory of Brownian motion, complex analysis, and partial differential equations. Focusing on constraint satisfaction problems, we apply our method to several classical problems, including Max-Cut, Max-2SAT, and MaxDiCut, and derive new algorithms that are competitive with the best known results. To illustrate the versatility and general applicability of our approach, we give new approximation algorithms for the Max-Cut problem with side constraints that crucially utilizes measure concentration results for the Sticky Brownian Motion, a feature missing from hyperplane rounding and its generalizations
△ Less
Submitted 19 October, 2019; v1 submitted 19 December, 2018;
originally announced December 2018.
-
Towards Instance-Optimal Private Query Release
Authors:
Jaroslaw Blasiok,
Mark Bun,
Aleksandar Nikolov,
Thomas Steinke
Abstract:
We study efficient mechanisms for the query release problem in differential privacy: given a workload of $m$ statistical queries, output approximate answers to the queries while satisfying the constraints of differential privacy. In particular, we are interested in mechanisms that optimally adapt to the given workload. Building on the projection mechanism of Nikolov, Talwar, and Zhang, and using t…
▽ More
We study efficient mechanisms for the query release problem in differential privacy: given a workload of $m$ statistical queries, output approximate answers to the queries while satisfying the constraints of differential privacy. In particular, we are interested in mechanisms that optimally adapt to the given workload. Building on the projection mechanism of Nikolov, Talwar, and Zhang, and using the ideas behind Dudley's chaining inequality, we propose new efficient algorithms for the query release problem, and prove that they achieve optimal sample complexity for the given workload (up to constant factors, in certain parameter regimes) with respect to the class of mechanisms that satisfy concentrated differential privacy. We also give variants of our algorithms that satisfy local differential privacy, and prove that they also achieve optimal sample complexity among all local sequentially interactive private mechanisms.
△ Less
Submitted 8 November, 2018;
originally announced November 2018.
-
Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design
Authors:
Aleksandar Nikolov,
Mohit Singh,
Uthaipon Tao Tantipongpipat
Abstract:
We study the optimal design problems where the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector in $d$ dimensions. We study the $A$-optimal design variant where the objective is to minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement…
▽ More
We study the optimal design problems where the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector in $d$ dimensions. We study the $A$-optimal design variant where the objective is to minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement in wireless networks, sparse least squares regression, feature selection for $k$-means clustering, and matrix approximation. In this paper, we introduce proportional volume sampling to obtain improved approximation algorithms for $A$-optimal design. Our main result is to obtain improved approximation algorithms for the $A$-optimal design problem by introducing the proportional volume sampling algorithm. Our results nearly optimal bounds in the asymptotic regime when the number of measurements done, $k$, is significantly more than the dimension $d$. We also give first approximation algorithms when $k$ is small including when $k=d$. The proportional volume-sampling algorithm also gives approximation algorithms for other optimal design objectives such as $D$-optimal design and generalized ratio objective matching or improving previous best known results. Interestingly, we show that a similar guarantee cannot be obtained for the $E$-optimal design problem. We also show that the $A$-optimal design problem is NP-hard to approximate within a fixed constant when $k=d$.
△ Less
Submitted 17 July, 2018; v1 submitted 22 February, 2018;
originally announced February 2018.
-
Tusnády's problem, the transference principle, and non-uniform QMC sampling
Authors:
Christoph Aistleitner,
Dmitriy Bilyk,
Aleksandar Nikolov
Abstract:
It is well-known that for every $N \geq 1$ and $d \geq 1$ there exist point sets $x_1, \dots, x_N \in [0,1]^d$ whose discrepancy with respect to the Lebesgue measure is of order at most $(\log N)^{d-1} N^{-1}$. In a more general setting, the first author proved together with Josef Dick that for any normalized measure $μ$ on $[0,1]^d$ there exist points $x_1, \dots, x_N$ whose discrepancy with resp…
▽ More
It is well-known that for every $N \geq 1$ and $d \geq 1$ there exist point sets $x_1, \dots, x_N \in [0,1]^d$ whose discrepancy with respect to the Lebesgue measure is of order at most $(\log N)^{d-1} N^{-1}$. In a more general setting, the first author proved together with Josef Dick that for any normalized measure $μ$ on $[0,1]^d$ there exist points $x_1, \dots, x_N$ whose discrepancy with respect to $μ$ is of order at most $(\log N)^{(3d+1)/2} N^{-1}$. The proof used methods from combinatorial mathematics, and in particular a result of Banaszczyk on balancings of vectors. In the present note we use a version of the so-called transference principle together with recent results on the discrepancy of red-blue colorings to show that for any $μ$ there even exist points having discrepancy of order at most $(\log N)^{d-\frac12} N^{-1}$, which is almost as good as the discrepancy bound in the case of the Lebesgue measure.
△ Less
Submitted 17 March, 2017;
originally announced March 2017.
-
Tighter Bounds for the Discrepancy of Boxes and Polytopes
Authors:
Aleksandar Nikolov
Abstract:
Combinatorial discrepancy is a complexity measure of a collection of sets which quantifies how well the sets in the collection can be simultaneously balanced. More precisely, we are given an n-point set $P$, and a collection $\mathcal{F} = \{F_1, ..., F_m\}$ of subsets of $P$, and our goal is color $P$ with two colors, red and blue, so that the maximum over the $F_i$ of the absolute difference bet…
▽ More
Combinatorial discrepancy is a complexity measure of a collection of sets which quantifies how well the sets in the collection can be simultaneously balanced. More precisely, we are given an n-point set $P$, and a collection $\mathcal{F} = \{F_1, ..., F_m\}$ of subsets of $P$, and our goal is color $P$ with two colors, red and blue, so that the maximum over the $F_i$ of the absolute difference between the number of red elements and the number of blue elements (the discrepancy) is minimized. Combinatorial discrepancy has many applications in mathematics and computer science, including constructions of uniformly distributed point sets, and lower bounds for data structures and private data analysis algorithms.
We investigate the combinatorial discrepancy of geometrically defined systems, in which $P$ is an n-point set in $d$-dimensional space ,and $\mathcal{F}$ is the collection of subsets of $P$ induced by dilations and translations of a fixed convex polytope $B$. Such set systems include systems of sets induced by axis-aligned boxes, whose discrepancy is the subject of the well known Tusnady problem. We prove new discrepancy upper and lower bounds for such set systems by extending the approach based on factorization norms previously used by the author and Matousek. We improve the best known upper bound for the Tusnady problem by a logarithmic factor, using a result of Banaszczyk on signed series of vectors. We extend this improvement to any arbitrary convex polytope $B$ by using a decomposition due to Matousek. Using Fourier analytic techniques, we also prove a nearly matching discrepancy lower bound for sets induced by any fixed bounded polytope $B$ satisfying a certain technical condition.
We also outline applications of our results to geometric discrepancy, data structure lower bounds, and differential privacy.
△ Less
Submitted 15 April, 2017; v1 submitted 19 January, 2017;
originally announced January 2017.
-
Towards a Constructive Version of Banaszczyk's Vector Balancing Theorem
Authors:
Daniel Dadush,
Shashwat Garg,
Shachar Lovett,
Aleksandar Nikolov
Abstract:
An important theorem of Banaszczyk (Random Structures & Algorithms `98) states that for any sequence of vectors of $\ell_2$ norm at most $1/5$ and any convex body $K$ of Gaussian measure $1/2$ in $\mathbb{R}^n$, there exists a signed combination of these vectors which lands inside $K$. A major open problem is to devise a constructive version of Banaszczyk's vector balancing theorem, i.e. to find a…
▽ More
An important theorem of Banaszczyk (Random Structures & Algorithms `98) states that for any sequence of vectors of $\ell_2$ norm at most $1/5$ and any convex body $K$ of Gaussian measure $1/2$ in $\mathbb{R}^n$, there exists a signed combination of these vectors which lands inside $K$. A major open problem is to devise a constructive version of Banaszczyk's vector balancing theorem, i.e. to find an efficient algorithm which constructs the signed combination.
We make progress towards this goal along several fronts. As our first contribution, we show an equivalence between Banaszczyk's theorem and the existence of $O(1)$-subgaussian distributions over signed combinations. For the case of symmetric convex bodies, our equivalence implies the existence of a universal signing algorithm (i.e. independent of the body), which simply samples from the subgaussian sign distribution and checks to see if the associated combination lands inside the body. For asymmetric convex bodies, we provide a novel recentering procedure, which allows us to reduce to the case where the body is symmetric.
As our second main contribution, we show that the above framework can be efficiently implemented when the vectors have length $O(1/\sqrt{\log n})$, recovering Banaszczyk's results under this stronger assumption. More precisely, we use random walk techniques to produce the required $O(1)$-subgaussian signing distributions when the vectors have length $O(1/\sqrt{\log n})$, and use a stochastic gradient ascent method to implement the recentering procedure for asymmetric bodies.
△ Less
Submitted 13 December, 2016;
originally announced December 2016.
-
Lower Bounds for Differential Privacy from Gaussian Width
Authors:
Assimakis Kattis,
Aleksandar Nikolov
Abstract:
We study the optimal sample complexity of a given workload of linear queries under the constraints of differential privacy. The sample complexity of a query answering mechanism under error parameter $α$ is the smallest $n$ such that the mechanism answers the workload with error at most $α$ on any database of size $n$. Following a line of research started by Hardt and Talwar [STOC 2010], we analyze…
▽ More
We study the optimal sample complexity of a given workload of linear queries under the constraints of differential privacy. The sample complexity of a query answering mechanism under error parameter $α$ is the smallest $n$ such that the mechanism answers the workload with error at most $α$ on any database of size $n$. Following a line of research started by Hardt and Talwar [STOC 2010], we analyze sample complexity using the tools of asymptotic convex geometry. We study the sensitivity polytope, a natural convex body associated with a query workload that quantifies how query answers can change between neighboring databases. This is the information that, roughly speaking, is protected by a differentially private algorithm, and, for this reason, we expect that a "bigger" sensitivity polytope implies larger sample complexity. Our results identify the mean Gaussian width as an appropriate measure of the size of the polytope, and show sample complexity lower bounds in terms of this quantity. Our lower bounds completely characterize the workloads for which the Gaussian noise mechanism is optimal up to constants as those having asymptotically maximal Gaussian width.
Our techniques also yield an alternative proof of Pisier's Volume Number Theorem which also suggests an approach to improving the parameters of the theorem.
△ Less
Submitted 8 December, 2016;
originally announced December 2016.
-
Approximate Near Neighbors for General Symmetric Norms
Authors:
Alexandr Andoni,
Huy L. Nguyen,
Aleksandar Nikolov,
Ilya Razenshteyn,
Erik Waingarten
Abstract:
We show that every symmetric normed space admits an efficient nearest neighbor search data structure with doubly-logarithmic approximation. Specifically, for every $n$, $d = n^{o(1)}$, and every $d$-dimensional symmetric norm $\|\cdot\|$, there exists a data structure for $\mathrm{poly}(\log \log n)$-approximate nearest neighbor search over $\|\cdot\|$ for $n$-point datasets achieving $n^{o(1)}$ q…
▽ More
We show that every symmetric normed space admits an efficient nearest neighbor search data structure with doubly-logarithmic approximation. Specifically, for every $n$, $d = n^{o(1)}$, and every $d$-dimensional symmetric norm $\|\cdot\|$, there exists a data structure for $\mathrm{poly}(\log \log n)$-approximate nearest neighbor search over $\|\cdot\|$ for $n$-point datasets achieving $n^{o(1)}$ query time and $n^{1+o(1)}$ space. The main technical ingredient of the algorithm is a low-distortion embedding of a symmetric norm into a low-dimensional iterated product of top-$k$ norms.
We also show that our techniques cannot be extended to general norms.
△ Less
Submitted 24 July, 2017; v1 submitted 18 November, 2016;
originally announced November 2016.
-
An Improved Private Mechanism for Small Databases
Authors:
Aleksandar Nikolov
Abstract:
We study the problem of answering a workload of linear queries $\mathcal{Q}$, on a database of size at most $n = o(|\mathcal{Q}|)$ drawn from a universe $\mathcal{U}$ under the constraint of (approximate) differential privacy. Nikolov, Talwar, and Zhang~\cite{NTZ} proposed an efficient mechanism that, for any given $\mathcal{Q}$ and $n$, answers the queries with average error that is at most a fac…
▽ More
We study the problem of answering a workload of linear queries $\mathcal{Q}$, on a database of size at most $n = o(|\mathcal{Q}|)$ drawn from a universe $\mathcal{U}$ under the constraint of (approximate) differential privacy. Nikolov, Talwar, and Zhang~\cite{NTZ} proposed an efficient mechanism that, for any given $\mathcal{Q}$ and $n$, answers the queries with average error that is at most a factor polynomial in $\log |\mathcal{Q}|$ and $\log |\mathcal{U}|$ worse than the best possible. Here we improve on this guarantee and give a mechanism whose competitiveness ratio is at most polynomial in $\log n$ and $\log |\mathcal{U}|$, and has no dependence on $|\mathcal{Q}|$. Our mechanism is based on the projection mechanism of Nikolov, Talwar, and Zhang, but in place of an ad-hoc noise distribution, we use a distribution which is in a sense optimal for the projection mechanism, and analyze it using convex duality and the restricted invertibility principle.
△ Less
Submitted 1 May, 2015;
originally announced May 2015.
-
Randomized Rounding for the Largest Simplex Problem
Authors:
Aleksandar Nikolov
Abstract:
The maximum volume $j$-simplex problem asks to compute the $j$-dimensional simplex of maximum volume inside the convex hull of a given set of $n$ points in $\mathbb{Q}^d$. We give a deterministic approximation algorithm for this problem which achieves an approximation ratio of $e^{j/2 + o(j)}$. The problem is known to be $\mathrm{NP}$-hard to approximate within a factor of $c^{j}$ for some constan…
▽ More
The maximum volume $j$-simplex problem asks to compute the $j$-dimensional simplex of maximum volume inside the convex hull of a given set of $n$ points in $\mathbb{Q}^d$. We give a deterministic approximation algorithm for this problem which achieves an approximation ratio of $e^{j/2 + o(j)}$. The problem is known to be $\mathrm{NP}$-hard to approximate within a factor of $c^{j}$ for some constant $c > 1$. Our algorithm also gives a factor $e^{j + o(j)}$ approximation for the problem of finding the principal $j\times j$ submatrix of a rank $d$ positive semidefinite matrix with the largest determinant. We achieve our approximation by rounding solutions to a generalization of the $D$-optimal design problem, or, equivalently, the dual of an appropriate smallest enclosing ellipsoid problem. Our arguments give a short and simple proof of a restricted invertibility principle for determinants.
△ Less
Submitted 14 April, 2015; v1 submitted 28 November, 2014;
originally announced December 2014.
-
Factorization Norms and Hereditary Discrepancy
Authors:
Jiri Matousek,
Aleksandar Nikolov,
Kunal Talwar
Abstract:
The $γ_2$ norm of a real $m\times n$ matrix $A$ is the minimum number $t$ such that the column vectors of $A$ are contained in a $0$-centered ellipsoid $E\subseteq\mathbb{R}^m$ which in turn is contained in the hypercube $[-t, t]^m$. We prove that this classical quantity approximates the \emph{hereditary discrepancy} $\mathrm{herdisc}\ A$ as follows:…
▽ More
The $γ_2$ norm of a real $m\times n$ matrix $A$ is the minimum number $t$ such that the column vectors of $A$ are contained in a $0$-centered ellipsoid $E\subseteq\mathbb{R}^m$ which in turn is contained in the hypercube $[-t, t]^m$. We prove that this classical quantity approximates the \emph{hereditary discrepancy} $\mathrm{herdisc}\ A$ as follows: $γ_2(A) = {O(\log m)}\cdot \mathrm{herdisc}\ A$ and $\mathrm{herdisc}\ A = O(\sqrt{\log m}\,)\cdotγ_2(A) $. Since $γ_2$ is polynomial-time computable, this gives a polynomial-time approximation algorithm for hereditary discrepancy. Both inequalities are shown to be asymptotically tight.
We then demonstrate on several examples the power of the $γ_2$ norm as a tool for proving lower and upper bounds in discrepancy theory. Most notably, we prove a new lower bound of $Ω(\log^{d-1} n)$ for the \emph{$d$-dimensional Tusnády problem}, asking for the combinatorial discrepancy of an $n$-point set in $\mathbb{R}^d$ with respect to axis-parallel boxes. For $d>2$, this improves the previous best lower bound, which was of order approximately $\log^{(d-1)/2}n$, and it comes close to the best known upper bound of $O(\log^{d+1/2}n)$, for which we also obtain a new, very simple proof.
△ Less
Submitted 8 April, 2015; v1 submitted 6 August, 2014;
originally announced August 2014.
-
Parallel Algorithms for Geometric Graph Problems
Authors:
Alexandr Andoni,
Aleksandar Nikolov,
Krzysztof Onak,
Grigory Yaroslavtsev
Abstract:
We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a $(1+ε)$-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of th…
▽ More
We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a $(1+ε)$-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years.
We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, $n^{1+o_ε(1)}$. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for $(1+ε)$-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a $(1+ε)$-approximation algorithm with $n^δ$ space in the streaming-with-sorting model with $1/δ^{O(1)}$ passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem.
△ Less
Submitted 4 January, 2014; v1 submitted 30 December, 2013;
originally announced January 2014.
-
Approximating Hereditary Discrepancy via Small Width Ellipsoids
Authors:
Aleksandar Nikolov,
Kunal Talwar
Abstract:
The Discrepancy of a hypergraph is the minimum attainable value, over two-colorings of its vertices, of the maximum absolute imbalance of any hyperedge. The Hereditary Discrepancy of a hypergraph, defined as the maximum discrepancy of a restriction of the hypergraph to a subset of its vertices, is a measure of its complexity. Lovasz, Spencer and Vesztergombi (1986) related the natural extension of…
▽ More
The Discrepancy of a hypergraph is the minimum attainable value, over two-colorings of its vertices, of the maximum absolute imbalance of any hyperedge. The Hereditary Discrepancy of a hypergraph, defined as the maximum discrepancy of a restriction of the hypergraph to a subset of its vertices, is a measure of its complexity. Lovasz, Spencer and Vesztergombi (1986) related the natural extension of this quantity to matrices to rounding algorithms for linear programs, and gave a determinant based lower bound on the hereditary discrepancy. Matousek (2011) showed that this bound is tight up to a polylogarithmic factor, leaving open the question of actually computing this bound. Recent work by Nikolov, Talwar and Zhang (2013) showed a polynomial time $\tilde{O}(\log^3 n)$-approximation to hereditary discrepancy, as a by-product of their work in differential privacy. In this paper, we give a direct simple $O(\log^{3/2} n)$-approximation algorithm for this problem. We show that up to this approximation factor, the hereditary discrepancy of a matrix $A$ is characterized by the optimal value of simple geometric convex program that seeks to minimize the largest $\ell_{\infty}$ norm of any point in a ellipsoid containing the columns of $A$. This characterization promises to be a useful tool in discrepancy theory.
△ Less
Submitted 23 July, 2014; v1 submitted 24 November, 2013;
originally announced November 2013.
-
On The Hereditary Discrepancy of Homogeneous Arithmetic Progressions
Authors:
Aleksandar Nikolov,
Kunal Talwar
Abstract:
We show that the hereditary discrepancy of homogeneous arithmetic progressions is lower bounded by $n^{1/O(\log \log n)}$. This bound is tight up to the constant in the exponent. Our lower bound goes via proving an exponential lower bound on the discrepancy of set systems of subcubes of the boolean cube $\{0, 1\}^d$.
We show that the hereditary discrepancy of homogeneous arithmetic progressions is lower bounded by $n^{1/O(\log \log n)}$. This bound is tight up to the constant in the exponent. Our lower bound goes via proving an exponential lower bound on the discrepancy of set systems of subcubes of the boolean cube $\{0, 1\}^d$.
△ Less
Submitted 8 April, 2015; v1 submitted 23 September, 2013;
originally announced September 2013.
-
Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations
Authors:
Cynthia Dwork,
Aleksandar Nikolov,
Kunal Talwar
Abstract:
Consider a database of $n$ people, each represented by a bit-string of length $d$ corresponding to the setting of $d$ binary attributes. A $k$-way marginal query is specified by a subset $S$ of $k$ attributes, and a $|S|$-dimensional binary vector $β$ specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to $S$ agree…
▽ More
Consider a database of $n$ people, each represented by a bit-string of length $d$ corresponding to the setting of $d$ binary attributes. A $k$-way marginal query is specified by a subset $S$ of $k$ attributes, and a $|S|$-dimensional binary vector $β$ specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to $S$ agrees with $β$.
Privately releasing approximate answers to a set of $k$-way marginal queries is one of the most important and well-motivated problems in differential privacy. Information theoretically, the error complexity of marginal queries is well-understood: the per-query additive error is known to be at least $Ω(\min\{\sqrt{n},d^{\frac{k}{2}}\})$ and at most $\tilde{O}(\min\{\sqrt{n} d^{1/4},d^{\frac{k}{2}}\})$. However, no polynomial time algorithm with error complexity as low as the information theoretic upper bound is known for small $n$. In this work we present a polynomial time algorithm that, for any distribution on marginal queries, achieves average error at most $\tilde{O}(\sqrt{n} d^{\frac{\lceil k/2 \rceil}{4}})$. This error bound is as good as the best known information theoretic upper bounds for $k=2$. This bound is an improvement over previous work on efficiently releasing marginals when $k$ is small and when error $o(n)$ is desirable. Using private boosting we are also able to give nearly matching worst-case error bounds.
Our algorithms are based on the geometric techniques of Nikolov, Talwar, and Zhang. The main new ingredients are convex relaxations and careful use of the Frank-Wolfe algorithm for constrained convex minimization. To design our relaxations, we rely on the Grothendieck inequality from functional analysis.
△ Less
Submitted 6 August, 2013;
originally announced August 2013.
-
Nearly Optimal Private Convolution
Authors:
Nadia Fawaz,
S. Muthukrishnan,
Aleksandar Nikolov
Abstract:
We study computing the convolution of a private input $x$ with a public input $h$, while satisfying the guarantees of $(ε, δ)$-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then cap…
▽ More
We study computing the convolution of a private input $x$ with a public input $h$, while satisfying the guarantees of $(ε, δ)$-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then captures important primitives including linear filtering, which is an essential tool in time series analysis, and aggregation queries on projections of the data.
We give a nearly optimal algorithm for computing convolutions while satisfying $(ε, δ)$-differential privacy. Surprisingly, we follow the simple strategy of adding independent Laplacian noise to each Fourier coefficient and bounding the privacy loss using the composition theorem of Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal noise to add to each Fourier coefficient using convex programming duality. Our algorithm is very efficient -- it is essentially no more computationally expensive than a Fast Fourier Transform.
To prove near optimality, we use the recent discrepancy lowerbounds of Muthukrishnan and Nikolov and derive a spectral lower bound using a characterization of discrepancy in terms of determinants.
△ Less
Submitted 27 January, 2013;
originally announced January 2013.
-
The Komlos Conjecture Holds for Vector Colorings
Authors:
Aleksandar Nikolov
Abstract:
The Komlos conjecture in discrepancy theory states that for some constant K and for any m by n matrix A whose columns lie in the unit ball there exists a +/- 1 vector x such that the infinity norm of Ax is bounded above by K. This conjecture also implies the Beck-Fiala conjecture on the discrepancy of bounded degree hypergraphs. Here we prove a natural relaxation of the Komlos conjecture: if the c…
▽ More
The Komlos conjecture in discrepancy theory states that for some constant K and for any m by n matrix A whose columns lie in the unit ball there exists a +/- 1 vector x such that the infinity norm of Ax is bounded above by K. This conjecture also implies the Beck-Fiala conjecture on the discrepancy of bounded degree hypergraphs. Here we prove a natural relaxation of the Komlos conjecture: if the columns of A are assigned unit real vectors rather than +/- 1 then the Komlos conjecture holds with K=1. Our result rules out the possibility of a counterexample to the conjecture based on semidefinite programming. It also opens the way to proving tighter efficient (polynomial-time computable) upper bounds for the conjecture using semidefinite programming techniques.
△ Less
Submitted 1 August, 2013; v1 submitted 17 January, 2013;
originally announced January 2013.
-
The Geometry of Differential Privacy: the Sparse and Approximate Cases
Authors:
Aleksandar Nikolov,
Kunal Talwar,
Li Zhang
Abstract:
In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of $d$ linear queries over a database $x \in \R^N$, we seek to find the differentially private mechanism that has the minimum mean squared error. Fo…
▽ More
In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of $d$ linear queries over a database $x \in \R^N$, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, an $O(\log^2 d)$ approximation to the optimal mechanism is known. Our first contribution is to give an $O(\log^2 d)$ approximation guarantee for the case of $(\eps,δ)$-differential privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex geometry.
We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when $d > n \triangleq \|x\|_1$. It is known that better mechanisms exist in this setting. Our second main contribution is to give an $(\eps,δ)$-differentially private mechanism which is optimal up to a $\polylog(d,N)$ factor for any given query set $A$ and any given upper bound $n$ on $\|x\|_1$. This approximation is achieved by coupling the Gaussian noise addition approach with a linear regression step. We give an analogous result for the $\eps$-differential privacy setting. We also improve on the mean squared error upper bound for answering counting queries on a database of size $n$ by Blum, Ligett, and Roth, and match the lower bound implied by the work of Dinur and Nissim up to logarithmic factors.
The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix $A$.
△ Less
Submitted 3 December, 2012;
originally announced December 2012.
-
Optimal Private Halfspace Counting via Discrepancy
Authors:
S. Muthukrishnan,
Aleksandar Nikolov
Abstract:
A range counting problem is specified by a set $P$ of size $|P| = n$ of points in $\mathbb{R}^d$, an integer weight $x_p$ associated to each point $p \in P$, and a range space ${\cal R} \subseteq 2^{P}$. Given a query range $R \in {\cal R}$, the target output is $R(\vec{x}) = \sum_{p \in R}{x_p}$. Range counting for different range spaces is a central problem in Computational Geometry.
We study…
▽ More
A range counting problem is specified by a set $P$ of size $|P| = n$ of points in $\mathbb{R}^d$, an integer weight $x_p$ associated to each point $p \in P$, and a range space ${\cal R} \subseteq 2^{P}$. Given a query range $R \in {\cal R}$, the target output is $R(\vec{x}) = \sum_{p \in R}{x_p}$. Range counting for different range spaces is a central problem in Computational Geometry.
We study $(ε, δ)$-differentially private algorithms for range counting. Our main results are for the range space given by hyperplanes, that is, the halfspace counting problem. We present an $(ε, δ)$-differentially private algorithm for halfspace counting in $d$ dimensions which achieves $O(n^{1-1/d})$ average squared error. This contrasts with the $Ω(n)$ lower bound established by the classical result of Dinur and Nissim [PODS 2003] for arbitrary subset counting queries. We also show a matching lower bound on average squared error for any $(ε, δ)$-differentially private algorithm for halfspace counting. Both bounds are obtained using discrepancy theory. For the lower bound, we use a modified discrepancy measure and bound approximation of $(ε, δ)$-differentially private algorithms for range counting queries in terms of this discrepancy. We also relate the modified discrepancy measure to classical combinatorial discrepancy, which allows us to exploit known discrepancy lower bounds. This approach also yields a lower bound of $Ω((\log n)^{d-1})$ for $(ε, δ)$-differentially private orthogonal range counting in $d$ dimensions, the first known superconstant lower bound for this problem. For the upper bound, we use an approach inspired by partial coloring methods for proving discrepancy upper bounds, and obtain $(ε, δ)$-differentially private algorithms for range counting with polynomially bounded shatter function range spaces.
△ Less
Submitted 24 March, 2012;
originally announced March 2012.
-
Private Decayed Sum Estimation under Continual Observation
Authors:
Jean Bolot,
Nadia Fawaz,
S. Muthukrishnan,
Aleksandar Nikolov,
Nina Taft
Abstract:
In monitoring applications, recent data is more important than distant data. How does this affect privacy of data analysis? We study a general class of data analyses - computing predicate sums - with privacy. Formally, we study the problem of estimating predicate sums {\em privately}, for sliding windows (and other well-known decay models of data, i.e. exponential and polynomial decay). We extend…
▽ More
In monitoring applications, recent data is more important than distant data. How does this affect privacy of data analysis? We study a general class of data analyses - computing predicate sums - with privacy. Formally, we study the problem of estimating predicate sums {\em privately}, for sliding windows (and other well-known decay models of data, i.e. exponential and polynomial decay). We extend the recently proposed continual privacy model of Dwork et al.
We present algorithms for decayed sum which are $\eps$-differentially private, and are accurate. For window and exponential decay sums, our algorithms are accurate up to additive $1/\eps$ and polylog terms in the range of the computed function; for polynomial decay sums which are technically more challenging because partial solutions do not compose easily, our algorithms incur additional relative error. Further, we show lower bounds, tight within polylog factors and tight with respect to the dependence on the probability of error.
△ Less
Submitted 2 March, 2012; v1 submitted 30 August, 2011;
originally announced August 2011.
-
A counterexample to Beck's conjecture on the discrepancy of three permutations
Authors:
Alantha Newman,
Aleksandar Nikolov
Abstract:
Given three permutations on the integers 1 through n, consider the set system consisting of each interval in each of the three permutations. Jozsef Beck conjectured (c. 1987) that the discrepancy of this set system is O(1). We give a counterexample to this conjecture: for any positive integer n = 3^k, we exhibit three permutations whose corresponding set system has discrepancy Omega(log(n)). Our c…
▽ More
Given three permutations on the integers 1 through n, consider the set system consisting of each interval in each of the three permutations. Jozsef Beck conjectured (c. 1987) that the discrepancy of this set system is O(1). We give a counterexample to this conjecture: for any positive integer n = 3^k, we exhibit three permutations whose corresponding set system has discrepancy Omega(log(n)). Our counterexample is based on a simple recursive construction, and our proof of the discrepancy lower bound is by induction. This example also disproves a generalization of Beck's conjecture due to Spencer, Srinivasan and Tetali, who conjectured that a set system corresponding to l permutations has discrepancy O(sqrt(l)).
△ Less
Submitted 14 April, 2011;
originally announced April 2011.
-
Pan-private Algorithms: When Memory Does Not Help
Authors:
Darakhshan Mir,
S. Muthukrishnan,
Aleksandar Nikolov,
Rebecca N. Wright
Abstract:
Consider updates arriving online in which the $t$th input is $(i_t,d_t)$, where $i_t$'s are thought of as IDs of users. Informally, a randomized function $f$ is {\em differentially private} with respect to the IDs if the probability distribution induced by $f$ is not much different from that induced by it on an input in which occurrences of an ID $j$ are replaced with some other ID $k$ Recently, t…
▽ More
Consider updates arriving online in which the $t$th input is $(i_t,d_t)$, where $i_t$'s are thought of as IDs of users. Informally, a randomized function $f$ is {\em differentially private} with respect to the IDs if the probability distribution induced by $f$ is not much different from that induced by it on an input in which occurrences of an ID $j$ are replaced with some other ID $k$ Recently, this notion was extended to {\em pan-privacy} where the computation of $f$ retains differential privacy, even if the internal memory of the algorithm is exposed to the adversary (say by a malicious break-in or by fiat by the government). This is a strong notion of privacy, and surprisingly, for basic counting tasks such as distinct counts, heavy hitters and others, Dwork et al~\cite{dwork-pan} present pan-private algorithms with reasonable accuracy. The pan-private algorithms are nontrivial, and rely on sampling. We reexamine these basic counting tasks and show improved bounds. In particular, we estimate the distinct count $\Dt$ to within $(1\pm \eps)\Dt \pm O(\polylog m)$, where $m$ is the number of elements in the universe. This uses suitably noisy statistics on sketches known in the streaming literature. We also present the first known lower bounds for pan-privacy with respect to a single intrusion. Our lower bounds show that, even if allowed to work with unbounded memory, pan-private algorithms for distinct counts can not be significantly more accurate than our algorithms. Our lower bound uses noisy decoding. For heavy hitter counts, we present a pan private streaming algorithm that is accurate to within $O(k)$ in worst case; previously known bound for this problem is arbitrarily worse. An interesting aspect of our pan-private algorithms is that, they deliberately use very small (polylogarithmic) space and tend to be streaming algorithms, even though using more space is not forbidden.
△ Less
Submitted 8 September, 2010;
originally announced September 2010.
-
Comparison of the Performance of Two Service Disciplines for a Shared Bus Multiprocessor with Private Caches
Authors:
Angel Vassilev Nikolov,
Lerato Lerato
Abstract:
In this paper, we compare two analytical models for evaluation of cache coherence overhead of a shared bus multiprocessor with private caches. The models are based on a closed queuing network with different service disciplines. We find that the priority discipline can be used as a lower-level bound. Some numerical results are shown graphically.
In this paper, we compare two analytical models for evaluation of cache coherence overhead of a shared bus multiprocessor with private caches. The models are based on a closed queuing network with different service disciplines. We find that the priority discipline can be used as a lower-level bound. Some numerical results are shown graphically.
△ Less
Submitted 20 April, 2010;
originally announced April 2010.
-
Limits of Approximation Algorithms: PCPs and Unique Games (DIMACS Tutorial Lecture Notes)
Authors:
Prahladh Harsha,
Moses Charikar,
Matthew Andrews,
Sanjeev Arora,
Subhash Khot,
Dana Moshkovitz,
Lisa Zhang,
Ashkan Aazami,
Dev Desai,
Igor Gorodezky,
Geetha Jagannathan,
Alexander S. Kulikov,
Darakhshan J. Mir,
Alantha Newman,
Aleksandar Nikolov,
David Pritchard,
Gwen Spencer
Abstract:
These are the lecture notes for the DIMACS Tutorial "Limits of Approximation Algorithms: PCPs and Unique Games" held at the DIMACS Center, CoRE Building, Rutgers University on 20-21 July, 2009. This tutorial was jointly sponsored by the DIMACS Special Focus on Hardness of Approximation, the DIMACS Special Focus on Algorithmic Foundations of the Internet, and the Center for Computational Intracta…
▽ More
These are the lecture notes for the DIMACS Tutorial "Limits of Approximation Algorithms: PCPs and Unique Games" held at the DIMACS Center, CoRE Building, Rutgers University on 20-21 July, 2009. This tutorial was jointly sponsored by the DIMACS Special Focus on Hardness of Approximation, the DIMACS Special Focus on Algorithmic Foundations of the Internet, and the Center for Computational Intractability with support from the National Security Agency and the National Science Foundation.
The speakers at the tutorial were Matthew Andrews, Sanjeev Arora, Moses Charikar, Prahladh Harsha, Subhash Khot, Dana Moshkovitz and Lisa Zhang. The sribes were Ashkan Aazami, Dev Desai, Igor Gorodezky, Geetha Jagannathan, Alexander S. Kulikov, Darakhshan J. Mir, Alantha Newman, Aleksandar Nikolov, David Pritchard and Gwen Spencer.
△ Less
Submitted 20 February, 2010;
originally announced February 2010.
-
On the Uncertainty Relations in Stochastic Mechanics
Authors:
D. A. Trifonov,
B. A. Nikolov,
I. M. Mladenov
Abstract:
It is shown that the Bohm equations for the phase $S$ and squared modulus $ρ$ of the quantum mechanical wave function can be derived from the classical ensemble equations admiting an aditional momentum $p_s$ of the form proportional to the osmotic velocity in the Nelson stochastic mechanics and using the variational principle with appropriate change of variables. The possibility to treat grad…
▽ More
It is shown that the Bohm equations for the phase $S$ and squared modulus $ρ$ of the quantum mechanical wave function can be derived from the classical ensemble equations admiting an aditional momentum $p_s$ of the form proportional to the osmotic velocity in the Nelson stochastic mechanics and using the variational principle with appropriate change of variables. The possibility to treat grad$S$ and $p_s$ as two parts of the momentum of quantum ensemble particles is considered from the view point of uncertainty relations of Robertson - Schroedinger type on the examples of the stochastic image of quantum mechanical canonical coherent and squeezed states.
△ Less
Submitted 18 October, 2009; v1 submitted 23 February, 2009;
originally announced February 2009.
-
Geometric Quantization, Coherent States and Stochastic Measurements
Authors:
B. A. Nikolov,
D. A. Trifonov
Abstract:
The geometric quantization problem is considered from the point of view of the Davies and Lewis approach to quantum mechanics. The influence of the measuring device is accounted in the classical and quantum case and it is shown that the conditions of the measurement define the type of quantization (Weyl, normal, antinormal, etc.). The quantum states and quantum operators are obtained by means of…
▽ More
The geometric quantization problem is considered from the point of view of the Davies and Lewis approach to quantum mechanics. The influence of the measuring device is accounted in the classical and quantum case and it is shown that the conditions of the measurement define the type of quantization (Weyl, normal, antinormal, etc.). The quantum states and quantum operators are obtained by means of the projection, defined from the system of generalized coherent states.
△ Less
Submitted 14 August, 2004;
originally announced August 2004.