-
Continual Mean Estimation Under User-Level Privacy
Authors:
Anand Jerry George,
Lekshmi Ramesh,
Aditya Vikram Singh,
Himanshu Tyagi
Abstract:
We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come…
▽ More
We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come together as the users often contribute data repeatedly and multiple queries are made. We provide an algorithm that outputs a mean estimate at every time instant $t$ such that the overall release is user-level $\varepsilon$-DP and has the following error guarantee: Denoting by $M_t$ the maximum number of samples contributed by a user, as long as $\tildeΩ(1/\varepsilon)$ users have $M_t/2$ samples each, the error at time $t$ is $\tilde{O}(1/\sqrt{t}+\sqrt{M}_t/t\varepsilon)$. This is a universal error guarantee which is valid for all arrival patterns of the users. Furthermore, it (almost) matches the existing lower bounds for the single-release setting at all time instants when users have contributed equal number of samples.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Multiple Support Recovery Using Very Few Measurements Per Sample
Authors:
Lekshmi Ramesh,
Chandra R. Murthy,
Himanshu Tyagi
Abstract:
In the problem of multiple support recovery, we are given access to linear measurements of multiple sparse samples in $\mathbb{R}^{d}$. These samples can be partitioned into $\ell$ groups, with samples having the same support belonging to the same group. For a given budget of $m$ measurements per sample, the goal is to recover the $\ell$ underlying supports, in the absence of the knowledge of grou…
▽ More
In the problem of multiple support recovery, we are given access to linear measurements of multiple sparse samples in $\mathbb{R}^{d}$. These samples can be partitioned into $\ell$ groups, with samples having the same support belonging to the same group. For a given budget of $m$ measurements per sample, the goal is to recover the $\ell$ underlying supports, in the absence of the knowledge of group labels. We study this problem with a focus on the measurement-constrained regime where $m$ is smaller than the support size $k$ of each sample. We design a two-step procedure that estimates the union of the underlying supports first, and then uses a spectral algorithm to estimate the individual supports. Our proposed estimator can recover the supports with $m<k$ measurements per sample, from $\tilde{O}(k^{4}\ell^{4}/m^{4})$ samples. Our guarantees hold for a general, generative model assumption on the samples and measurement matrices. We also provide results from experiments conducted on synthetic data and on the MNIST dataset.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Phase Transitions for Support Recovery from Gaussian Linear Measurements
Authors:
Lekshmi Ramesh,
Chandra R. Murthy,
Himanshu Tyagi
Abstract:
We study the problem of recovering the common $k$-sized support of a set of $n$ samples of dimension $d$, using $m$ noisy linear measurements per sample. Most prior work has focused on the case when $m$ exceeds $k$, in which case $n$ of the order $(k/m)\log(d/k)$ is both necessary and sufficient. Thus, in this regime, only the total number of measurements across the samples matter, and there is no…
▽ More
We study the problem of recovering the common $k$-sized support of a set of $n$ samples of dimension $d$, using $m$ noisy linear measurements per sample. Most prior work has focused on the case when $m$ exceeds $k$, in which case $n$ of the order $(k/m)\log(d/k)$ is both necessary and sufficient. Thus, in this regime, only the total number of measurements across the samples matter, and there is not much benefit in getting more than $k$ measurements per sample. In the measurement-constrained regime where we have access to fewer than $k$ measurements per sample, we show an upper bound of $O((k^{2}/m^{2})\log d)$ on the sample complexity for successful support recovery when $m\ge 2\log d$. Along with the lower bound from our previous work, this shows a phase transition for the sample complexity of this problem around $k/m=1$. In fact, our proposed algorithm is sample-optimal in both the regimes. It follows that, in the $m\ll k$ regime, multiple measurements from the same sample are more valuable than measurements from different samples.
△ Less
Submitted 12 May, 2021; v1 submitted 30 January, 2021;
originally announced February 2021.
-
Sample-Measurement Tradeoff in Support Recovery under a Subgaussian Prior
Authors:
Lekshmi Ramesh,
Chandra R Murthy,
Himanshu Tyagi
Abstract:
Data samples from $\mathbb{R}^{d}$ with a common support of size $k$ are accessed through $m$ random linear projections (measurements) per sample. It is well-known that roughly $k$ measurements from a single sample are sufficient to recover the support. In the multiple sample setting, do $k$ overall measurements still suffice when only $m$ measurements per sample are allowed, with $m<k$? We answer…
▽ More
Data samples from $\mathbb{R}^{d}$ with a common support of size $k$ are accessed through $m$ random linear projections (measurements) per sample. It is well-known that roughly $k$ measurements from a single sample are sufficient to recover the support. In the multiple sample setting, do $k$ overall measurements still suffice when only $m$ measurements per sample are allowed, with $m<k$? We answer this question in the negative by considering a generative model setting with independent samples drawn from a subgaussian prior. We show that $n=Θ((k^2/m^2)\cdot\log k(d-k))$ samples are necessary and sufficient to recover the support exactly. In turn, this shows that when $m<k$, $k$ overall measurements are insufficient for support recovery; instead we need about $m$ measurements each from $k^{2}/m^2$ samples, i.e., $k^{2}/m$ overall measurements are necessary.
△ Less
Submitted 19 September, 2020; v1 submitted 24 December, 2019;
originally announced December 2019.
-
Reliable Mining of Automatically Generated Test Cases from Software Requirements Specification (SRS)
Authors:
Lilly Raamesh,
G. V. Uma
Abstract:
Writing requirements is a two-way process. In this paper we use to classify Functional Requirements (FR) and Non Functional Requirements (NFR) statements from Software Requirements Specification (SRS) documents. This is systematically transformed into state charts considering all relevant information. The current paper outlines how test cases can be automatically generated from these state chart…
▽ More
Writing requirements is a two-way process. In this paper we use to classify Functional Requirements (FR) and Non Functional Requirements (NFR) statements from Software Requirements Specification (SRS) documents. This is systematically transformed into state charts considering all relevant information. The current paper outlines how test cases can be automatically generated from these state charts. The application of the states yields the different test cases as solutions to a planning problem. The test cases can be used for automated or manual software testing on system level. And also the paper presents a method for reduction of test suite by using mining methods thereby facilitating the mining and knowledge extraction from test cases.
△ Less
Submitted 5 February, 2010;
originally announced February 2010.