Search | arXiv e-print repository

doi 10.1145/3651304

FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic Faces

Authors: Safa C. Medin, Gengyan Li, Ruofei Du, Stephan Garbin, Philip Davidson, Gregory W. Wornell, Thabo Beeler, Abhimitra Meka

Abstract: 3D rendering of dynamic face captures is a challenging problem, and it demands improvements on several fronts$\unicode{x2014}$photorealism, efficiency, compatibility, and configurability. We present a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. It runs natively on commodity graphics soft- a… ▽ More 3D rendering of dynamic face captures is a challenging problem, and it demands improvements on several fronts$\unicode{x2014}$photorealism, efficiency, compatibility, and configurability. We present a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. It runs natively on commodity graphics soft- and hardware, and allows for a graceful trade-off between quality and efficiency. Our method utilizes recent advances in neural rendering, particularly learning discrete radiance manifolds to sparsely sample the scene to model volumetric effects. We achieve efficient modeling by learning a single set of manifolds for the entire dynamic sequence, while implicitly modeling appearance changes as temporal canonical texture. We export a single layered mesh and view-independent RGBA texture video that is compatible with legacy graphics renderers without additional ML integration. We demonstrate our method by rendering dynamic face captures of real actors in a game engine, at comparable photorealism to state-of-the-art neural rendering techniques at previously unseen frame rates. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: In Proceedings of the ACM in Computer Graphics and Interactive Techniques, 2024

arXiv:2403.08819 [pdf, other]

Thermometer: Towards Universal Calibration for Large Language Models

Authors: Maohao Shen, Subhro Das, Kristjan Greenewald, Prasanna Sattigeri, Gregory Wornell, Soumya Ghosh

Abstract: We consider the issue of calibration in large language models (LLM). Recent studies have found that common interventions such as instruction tuning often result in poorly calibrated LLMs. Although calibration is well-explored in traditional applications, calibrating LLMs is uniquely challenging. These challenges stem as much from the severe computational requirements of LLMs as from their versatil… ▽ More We consider the issue of calibration in large language models (LLM). Recent studies have found that common interventions such as instruction tuning often result in poorly calibrated LLMs. Although calibration is well-explored in traditional applications, calibrating LLMs is uniquely challenging. These challenges stem as much from the severe computational requirements of LLMs as from their versatility, which allows them to be applied to diverse tasks. Addressing these challenges, we propose THERMOMETER, a calibration approach tailored to LLMs. THERMOMETER learns an auxiliary model, given data from multiple tasks, for calibrating a LLM. It is computationally efficient, preserves the accuracy of the LLM, and produces better-calibrated responses for new tasks. Extensive empirical evaluations across various benchmarks demonstrate the effectiveness of the proposed method. △ Less

Submitted 27 June, 2024; v1 submitted 19 February, 2024; originally announced March 2024.

Comments: Camera ready version for ICML 2024

arXiv:2402.06160 [pdf, other]

Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?

Authors: Maohao Shen, J. Jon Ryu, Soumya Ghosh, Yuheng Bu, Prasanna Sattigeri, Subhro Das, Gregory W. Wornell

Abstract: This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emph{evidential deep learning} (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies… ▽ More This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emph{evidential deep learning} (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies by Bengs et al. identify limitations of the existing methods to conclude their learned epistemic uncertainties are unreliable, e.g., in that they are non-vanishing even with infinite data. Building on and sharpening such analysis, we 1) provide a sharper understanding of the asymptotic behavior of a wide class of EDL methods by unifying various objective functions; 2) reveal that the EDL methods can be better interpreted as an out-of-distribution detection algorithm based on energy-based-models; and 3) conduct extensive ablation studies to better assess their empirical effectiveness with real-world datasets. Through all these analyses, we conclude that even when EDL methods are empirically effective on downstream tasks, this occurs despite their poor uncertainty quantification capabilities. Our investigation suggests that incorporating model uncertainty can help EDL methods faithfully quantify uncertainties and further improve performance on representative downstream tasks, albeit at the cost of additional computational complexity. △ Less

Submitted 12 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 29 pages, 12 figures

arXiv:2402.03683 [pdf, other]

Gambling-Based Confidence Sequences for Bounded Random Vectors

Authors: J. Jon Ryu, Gregory W. Wornell

Abstract: A confidence sequence (CS) is a sequence of confidence sets that contains a target parameter of an underlying stochastic process at any time step with high probability. This paper proposes a new approach to constructing CSs for means of bounded multivariate stochastic processes using a general gambling framework, extending the recently established coin toss framework for bounded random processes.… ▽ More A confidence sequence (CS) is a sequence of confidence sets that contains a target parameter of an underlying stochastic process at any time step with high probability. This paper proposes a new approach to constructing CSs for means of bounded multivariate stochastic processes using a general gambling framework, extending the recently established coin toss framework for bounded random processes. The proposed gambling framework provides a general recipe for constructing CSs for categorical and probability-vector-valued observations, as well as for general bounded multidimensional observations through a simple reduction. This paper specifically explores the use of the mixture portfolio, akin to Cover's universal portfolio, in the proposed framework and investigates the properties of the resulting CSs. Simulations demonstrate the tightness of these confidence sequences compared to existing methods. When applied to the sampling without-replacement setting for finite categorical data, it is shown that the resulting CS based on a universal gambling strategy is provably tighter than that of the posterior-prior ratio martingale proposed by Waudby-Smith and Ramdas. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 17 pages, 3 figures

arXiv:2402.03655 [pdf, other]

Operator SVD with Neural Networks via Nested Low-Rank Approximation

Authors: J. Jon Ryu, Xiangxiang Xu, H. S. Melihcan Erol, Yuheng Bu, Lizhong Zheng, Gregory W. Wornell

Abstract: Computing eigenvalue decomposition (EVD) of a given linear operator, or finding its leading eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific computing problems. For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra technique… ▽ More Computing eigenvalue decomposition (EVD) of a given linear operator, or finding its leading eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific computing problems. For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra techniques. This paper proposes a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition, accompanied by new techniques called nesting for learning the top-$L$ singular values and singular functions in the correct order. The proposed method promotes the desired orthogonality in the learned functions implicitly and efficiently via an unconstrained optimization formulation, which is easy to solve with off-the-shelf gradient-based optimization algorithms. We demonstrate the effectiveness of the proposed optimization framework for use cases in computational physics and machine learning. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 44 pages, 7 figures

arXiv:2309.06413 [pdf, other]

On Computationally Efficient Learning of Exponential Family Distributions

Authors: Abhin Shah, Devavrat Shah, Gregory W. Wornell

Abstract: We consider the classical problem of learning, with arbitrary accuracy, the natural parameters of a $k$-parameter truncated \textit{minimal} exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this… ▽ More We consider the classical problem of learning, with arbitrary accuracy, the natural parameters of a $k$-parameter truncated \textit{minimal} exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotically normal, and asymptotically efficient, evaluating it is computationally hard. In this work, we propose a novel loss function and a computationally efficient estimator that is consistent as well as asymptotically normal under mild conditions. We show that, at the population level, our method can be viewed as the maximum likelihood estimation of a re-parameterized distribution belonging to the same class of exponential family. Further, we show that our estimator can be interpreted as a solution to minimizing a particular Bregman score as well as an instance of minimizing the \textit{surrogate} likelihood. We also provide finite sample guarantees to achieve an error (in $\ell_2$-norm) of $α$ in the parameter estimation with sample complexity $O({\sf poly}(k)/α^2)$. Our method achives the order-optimal sample complexity of $O({\sf log}(k)/α^2)$ when tailored for node-wise-sparse Markov random fields. Finally, we demonstrate the performance of our estimator via numerical experiments. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: An earlier version of this work arXiv:2110.15397 was presented at the Neural Information Processing Systems Conference in December 2021 titled "A Computationally Efficient Method for Learning Exponential Family Distributions"

arXiv:2306.14411 [pdf, other]

Score-based Source Separation with Applications to Digital Communication Signals

Authors: Tejas Jayashankar, Gary C. F. Lee, Alejandro Lancho, Amir Weiss, Yury Polyanskiy, Gregory W. Wornell

Abstract: We propose a new method for separating superimposed sources using diffusion-based generative models. Our method relies only on separately trained statistical priors of independent sources to establish a new objective function guided by maximum a posteriori estimation with an $α$-posterior, across multiple levels of Gaussian smoothing. Motivated by applications in radio-frequency (RF) systems, we a… ▽ More We propose a new method for separating superimposed sources using diffusion-based generative models. Our method relies only on separately trained statistical priors of independent sources to establish a new objective function guided by maximum a posteriori estimation with an $α$-posterior, across multiple levels of Gaussian smoothing. Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature and the recovery of encoded bits from a signal of interest, as measured by the bit error rate (BER). Experimental results with RF mixtures demonstrate that our method results in a BER reduction of 95% over classical and existing learning-based methods. Our analysis demonstrates that our proposed method yields solutions that asymptotically approach the modes of an underlying discrete distribution. Furthermore, our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme, shedding additional light on its use beyond conditional sampling. The project webpage is available at https://alpha-rgs.github.io △ Less

Submitted 17 January, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 34 pages, 18 figures, for associated project webpage see https://alpha-rgs.github.io

arXiv:2306.05583 [pdf, other]

Gibbs-Based Information Criteria and the Over-Parameterized Regime

Authors: Haobo Chen, Yuheng Bu, Gregory W. Wornell

Abstract: Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) an… ▽ More Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for models learned by the Gibbs algorithm. Notably, the penalty terms for the Gibbs-based AIC and BIC correspond to specific information measures, i.e., symmetrized KL information and KL divergence. We extend this information-theoretic analysis to over-parameterized models by providing two different Gibbs-based BICs to compute the marginal likelihood of random feature models in the regime where the number of parameters $p$ and the number of samples $n$ tend to infinity, with $p/n$ fixed. Our experiments demonstrate that the Gibbs-based BIC can select the high-dimensional model and reveal the mismatch between marginal likelihood and population risk in the over-parameterized regime, providing new insights to understand double-descent. △ Less

Submitted 13 November, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2305.08207 [pdf, other]

A Bilateral Bound on the Mean-Square Error for Estimation in Model Mismatch

Authors: Amir Weiss, Alejandro Lancho, Yuheng Bu, Gregory W. Wornell

Abstract: A bilateral (i.e., upper and lower) bound on the mean-square error under a general model mismatch is developed. The bound, which is derived from the variational representation of the chi-square divergence, is applicable in the Bayesian and nonBayesian frameworks to biased and unbiased estimators. Unlike other classical MSE bounds that depend only on the model, our bound is also estimator-dependent… ▽ More A bilateral (i.e., upper and lower) bound on the mean-square error under a general model mismatch is developed. The bound, which is derived from the variational representation of the chi-square divergence, is applicable in the Bayesian and nonBayesian frameworks to biased and unbiased estimators. Unlike other classical MSE bounds that depend only on the model, our bound is also estimator-dependent. Thus, it is applicable as a tool for characterizing the MSE of a specific estimator. The proposed bounding technique has a variety of applications, one of which is a tool for proving the consistency of estimators for a class of models. Furthermore, it provides insight as to why certain estimators work well under general model mismatch conditions. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: Accepted for publication in Proc. of ISIT 2023

arXiv:2305.00593 [pdf, other]

Reliable Gradient-free and Likelihood-free Prompt Tuning

Authors: Maohao Shen, Soumya Ghosh, Prasanna Sattigeri, Subhro Das, Yuheng Bu, Gregory Wornell

Abstract: Due to privacy or commercial constraints, large pre-trained language models (PLMs) are often offered as black-box APIs. Fine-tuning such models to downstream tasks is challenging because one can neither access the model's internal representations nor propagate gradients through it. This paper addresses these challenges by develo** techniques for adapting PLMs with only API access. Building on re… ▽ More Due to privacy or commercial constraints, large pre-trained language models (PLMs) are often offered as black-box APIs. Fine-tuning such models to downstream tasks is challenging because one can neither access the model's internal representations nor propagate gradients through it. This paper addresses these challenges by develo** techniques for adapting PLMs with only API access. Building on recent work on soft prompt tuning, we develop methods to tune the soft prompts without requiring gradient computation. Further, we develop extensions that in addition to not requiring gradients also do not need to access any internal representation of the PLM beyond the input embeddings. Moreover, instead of learning a single prompt, our methods learn a distribution over prompts allowing us to quantify predictive uncertainty. Ours is the first work to consider uncertainty in prompts when only having API access to the PLM. Finally, through extensive experiments, we carefully vet the proposed methods and find them competitive with (and sometimes even improving on) gradient-based approaches with full access to the PLM. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: EACL 2023 (Findings)

arXiv:2304.14332 [pdf, other]

On the Generalization Error of Meta Learning for the Gibbs Algorithm

Authors: Yuheng Bu, Harsha Vardhan Tetali, Gholamali Aminian, Miguel Rodrigues, Gregory Wornell

Abstract: We analyze the generalization ability of joint-training meta learning algorithms via the Gibbs algorithm. Our exact characterization of the expected meta generalization error for the meta Gibbs algorithm is based on symmetrized KL information, which measures the dependence between all meta-training datasets and the output parameters, including task-specific and meta parameters. Additionally, we de… ▽ More We analyze the generalization ability of joint-training meta learning algorithms via the Gibbs algorithm. Our exact characterization of the expected meta generalization error for the meta Gibbs algorithm is based on symmetrized KL information, which measures the dependence between all meta-training datasets and the output parameters, including task-specific and meta parameters. Additionally, we derive an exact characterization of the meta generalization error for the super-task Gibbs algorithm, in terms of conditional symmetrized KL information within the super-sample and super-task framework introduced in Steinke and Zakynthinou (2020) and Hellstrom and Durisi (2022) respectively. Our results also enable us to provide novel distribution-free generalization error upper bounds for these Gibbs algorithms applicable to meta learning. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: Accepted at ISIT 2023

arXiv:2303.06438 [pdf, ps, other]

doi 10.1109/ICASSP49357.2023.10096702

On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals

Authors: Gary C. F. Lee, Amir Weiss, Alejandro Lancho, Yury Polyanskiy, Gregory W. Wornell

Abstract: We study the single-channel source separation problem involving orthogonal frequency-division multiplexing (OFDM) signals, which are ubiquitous in many modern-day digital communication systems. Related efforts have been pursued in monaural source separation, where state-of-the-art neural architectures have been adopted to train an end-to-end separator for audio signals (as 1-dimensional time serie… ▽ More We study the single-channel source separation problem involving orthogonal frequency-division multiplexing (OFDM) signals, which are ubiquitous in many modern-day digital communication systems. Related efforts have been pursued in monaural source separation, where state-of-the-art neural architectures have been adopted to train an end-to-end separator for audio signals (as 1-dimensional time series). In this work, through a prototype problem based on the OFDM source model, we assess -- and question -- the efficacy of using audio-oriented neural architectures in separating signals based on features pertinent to communication waveforms. Perhaps surprisingly, we demonstrate that in some configurations, where perfect separation is theoretically attainable, these audio-oriented neural architectures perform poorly in separating co-channel OFDM waveforms. Yet, we propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures, that can confer about 30 dB improvement in performance. △ Less

Submitted 15 March, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

arXiv:2302.08077 [pdf, other]

Group Fairness with Uncertainty in Sensitive Attributes

Authors: Abhin Shah, Maohao Shen, Jongha Jon Ryu, Subhro Das, Prasanna Sattigeri, Yuheng Bu, Gregory W. Wornell

Abstract: Learning a fair predictive model is crucial to mitigate biased decisions against minority groups in high-stakes applications. A common approach to learn such a model involves solving an optimization problem that maximizes the predictive power of the model under an appropriate group fairness constraint. However, in practice, sensitive attributes are often missing or noisy resulting in uncertainty.… ▽ More Learning a fair predictive model is crucial to mitigate biased decisions against minority groups in high-stakes applications. A common approach to learn such a model involves solving an optimization problem that maximizes the predictive power of the model under an appropriate group fairness constraint. However, in practice, sensitive attributes are often missing or noisy resulting in uncertainty. We demonstrate that solely enforcing fairness constraints on uncertain sensitive attributes can fall significantly short in achieving the level of fairness of models trained without uncertainty. To overcome this limitation, we propose a bootstrap-based algorithm that achieves the target level of fairness despite the uncertainty in sensitive attributes. The algorithm is guided by a Gaussian analysis for the independence notion of fairness where we propose a robust quadratically constrained quadratic problem to ensure a strict fairness guarantee with uncertain sensitive attributes. Our algorithm is applicable to both discrete and continuous sensitive attributes and is effective in real-world classification and regression tasks for various group fairness notions, e.g., independence and separation. △ Less

Submitted 7 June, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2212.07359 [pdf, other]

Post-hoc Uncertainty Learning using a Dirichlet Meta-Model

Authors: Maohao Shen, Yuheng Bu, Prasanna Sattigeri, Soumya Ghosh, Subhro Das, Gregory Wornell

Abstract: It is known that neural networks have the problem of being over-confident when directly using the output label distribution to generate uncertainty measures. Existing methods mainly resolve this issue by retraining the entire model to impose the uncertainty quantification capability so that the learned model can achieve desired performance in accuracy and uncertainty prediction simultaneously. How… ▽ More It is known that neural networks have the problem of being over-confident when directly using the output label distribution to generate uncertainty measures. Existing methods mainly resolve this issue by retraining the entire model to impose the uncertainty quantification capability so that the learned model can achieve desired performance in accuracy and uncertainty prediction simultaneously. However, training the model from scratch is computationally expensive and may not be feasible in many situations. In this work, we consider a more practical post-hoc uncertainty learning setting, where a well-trained base model is given, and we focus on the uncertainty quantification task at the second stage of training. We propose a novel Bayesian meta-model to augment pre-trained models with better uncertainty quantification abilities, which is effective and computationally efficient. Our proposed method requires no additional training data and is flexible enough to quantify different uncertainties and easily adapt to different application settings, including out-of-domain data detection, misclassification detection, and trustworthy transfer learning. We demonstrate our proposed meta-model approach's flexibility and superior empirical performance on these applications over multiple representative image classification benchmarks. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Comments: Accepted by AAAI 2023

arXiv:2211.08209 [pdf, other]

On counterfactual inference with unobserved confounding

Authors: Abhin Shah, Raaz Dwivedi, Devavrat Shah, Gregory W. Wornell

Abstract: Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneit… ▽ More Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the conditional distribution of the outcomes as an exponential family, we reduce learning the unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing covariates. △ Less

Submitted 14 September, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

arXiv:2210.09864 [pdf, ps, other]

Information-theoretic Characterizations of Generalization Error for the Gibbs Algorithm

Authors: Gholamali Aminian, Yuheng Bu, Laura Toni, Miguel R. D. Rodrigues, Gregory W. Wornell

Abstract: Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and even vacuous when evaluated in practice. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contributions are exact characterizations of the expected generalization error of the wel… ▽ More Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and even vacuous when evaluated in practice. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contributions are exact characterizations of the expected generalization error of the well-known Gibbs algorithm (a.k.a. Gibbs posterior) using different information measures, in particular, the symmetrized KL information between the input training samples and the output hypothesis. Our result can be applied to tighten existing expected generalization error and PAC-Bayesian bounds. Our information-theoretic approach is versatile, as it also characterizes the generalization error of the Gibbs algorithm with a data-dependent regularizer and that of the Gibbs algorithm in the asymptotic regime, where it converges to the standard empirical risk minimization algorithm. Of particular relevance, our results highlight the role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: under review. arXiv admin note: text overlap with arXiv:2107.13656, arXiv:2111.01635

arXiv:2209.10077 [pdf, other]

Can Shadows Reveal Biometric Information?

Authors: Safa C. Medin, Amir Weiss, Frédo Durand, William T. Freeman, Gregory W. Wornell

Abstract: We study the problem of extracting biometric information of individuals by looking at shadows of objects cast on diffuse surfaces. We show that the biometric information leakage from shadows can be sufficient for reliable identity inference under representative scenarios via a maximum likelihood analysis. We then develop a learning-based method that demonstrates this phenomenon in real settings, e… ▽ More We study the problem of extracting biometric information of individuals by looking at shadows of objects cast on diffuse surfaces. We show that the biometric information leakage from shadows can be sufficient for reliable identity inference under representative scenarios via a maximum likelihood analysis. We then develop a learning-based method that demonstrates this phenomenon in real settings, exploiting the subtle cues in the shadows that are the source of the leakage without requiring any labeled real data. In particular, our approach relies on building synthetic scenes composed of 3D face models obtained from a single photograph of each identity. We transfer what we learn from the synthetic data to the real data using domain adaptation in a completely unsupervised way. Our model is able to generalize well to the real domain and is robust to several variations in the scenes. We report high classification accuracies in an identity classification task that takes place in a scene with unknown geometry and occluding objects. △ Less

Submitted 4 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

arXiv:2209.04871 [pdf, other]

doi 10.1109/GLOBECOM48099.2022.10001513

Data-Driven Blind Synchronization and Interference Rejection for Digital Communication Signals

Authors: Alejandro Lancho, Amir Weiss, Gary C. F. Lee, Jennifer Tang, Yuheng Bu, Yury Polyanskiy, Gregory W. Wornell

Abstract: We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture. In particular, we assume knowledge on the generation process of one of the signals, dubbed signal of interest (SOI), and no knowledge on the generation process of the second signal, referred to as interference. This form of the single-channel source separati… ▽ More We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture. In particular, we assume knowledge on the generation process of one of the signals, dubbed signal of interest (SOI), and no knowledge on the generation process of the second signal, referred to as interference. This form of the single-channel source separation problem is also referred to as interference rejection. We show that capturing high-resolution temporal structures (nonstationarities), which enables accurate synchronization to both the SOI and the interference, leads to substantial performance gains. With this key insight, we propose a domain-informed neural network (NN) design that is able to improve upon both "off-the-shelf" NNs and classical detection and interference rejection methods, as demonstrated in our simulations. Our findings highlight the key role communication-specific domain knowledge plays in the development of data-driven approaches that hold the promise of unprecedented gains. △ Less

Submitted 11 September, 2022; originally announced September 2022.

Comments: 9 pages, 6 figures, accepted at IEEE GLOBECOM 2022 (this version contains extended proofs)

arXiv:2208.10325 [pdf, other]

doi 10.1109/MLSP55214.2022.9943311

Exploiting Temporal Structures of Cyclostationary Signals for Data-Driven Single-Channel Source Separation

Authors: Gary C. F. Lee, Amir Weiss, Alejandro Lancho, Jennifer Tang, Yuheng Bu, Yury Polyanskiy, Gregory W. Wornell

Abstract: We study the problem of single-channel source separation (SCSS), and focus on cyclostationary signals, which are particularly suitable in a variety of application domains. Unlike classical SCSS approaches, we consider a setting where only examples of the sources are available rather than their models, inspiring a data-driven approach. For source models with underlying cyclostationary Gaussian cons… ▽ More We study the problem of single-channel source separation (SCSS), and focus on cyclostationary signals, which are particularly suitable in a variety of application domains. Unlike classical SCSS approaches, we consider a setting where only examples of the sources are available rather than their models, inspiring a data-driven approach. For source models with underlying cyclostationary Gaussian constituents, we establish a lower bound on the attainable mean squared error (MSE) for any separation method, model-based or data-driven. Our analysis further reveals the operation for optimal separation and the associated implementation challenges. As a computationally attractive alternative, we propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator. We demonstrate in simulation that, with suitable domain-informed architectural choices, our U-Net method can approach the optimal performance with substantially reduced computational burden. △ Less

Submitted 22 August, 2022; originally announced August 2022.

arXiv:2207.10222 [pdf, other]

Direct Localization in Underwater Acoustics via Convolutional Neural Networks: A Data-Driven Approach

Authors: Amir Weiss, Toros Arikan, Gregory W. Wornell

Abstract: Direct localization (DLOC) methods, which use the observed data to localize a source at an unknown position in a one-step procedure, generally outperform their indirect two-step counterparts (e.g., using time-difference of arrivals). However, underwater acoustic DLOC methods require prior knowledge of the environment, and are computationally costly, hence slow. We propose, what is to the best of o… ▽ More Direct localization (DLOC) methods, which use the observed data to localize a source at an unknown position in a one-step procedure, generally outperform their indirect two-step counterparts (e.g., using time-difference of arrivals). However, underwater acoustic DLOC methods require prior knowledge of the environment, and are computationally costly, hence slow. We propose, what is to the best of our knowledge, the first data-driven DLOC method. Inspired by classical and contemporary optimal model-based DLOC solutions, and leveraging the capabilities of convolutional neural networks (CNNs), we devise a holistic CNN-based solution. Our method includes a specifically-tailored input structure, architecture, loss function, and a progressive training procedure, which are of independent interest in the broader context of machine learning. We demonstrate that our method outperforms attractive alternatives, and asymptotically matches the performance of an oracle optimal model-based solution. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2202.12150 [pdf, ps, other]

Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Authors: Gholamali Aminian, Yuheng Bu, Gregory Wornell, Miguel Rodrigues

Abstract: Generalization error bounds are essential to understanding machine learning algorithms. This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample. Multiple generalization error upper bounds based on different information measures are provided, including Wasserstein distance, total variat… ▽ More Generalization error bounds are essential to understanding machine learning algorithms. This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample. Multiple generalization error upper bounds based on different information measures are provided, including Wasserstein distance, total variation distance, KL divergence, and Jensen-Shannon divergence. Due to the convexity of the information measures, the proposed bounds in terms of Wasserstein distance and total variation distance are shown to be tighter than their counterparts based on individual samples in the literature. An example is provided to demonstrate the tightness of the proposed generalization error bounds. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: 10 pages, 1 figure

arXiv:2202.00796 [pdf, other]

On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain Adaptation

Authors: Maohao Shen, Yuheng Bu, Gregory Wornell

Abstract: Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models. Existing methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models, which focus on improving… ▽ More Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models. Existing methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models, which focus on improving the pseudo-labeling techniques or proposing new training objectives. Instead, we aim to analyze the fundamental limits of MSFDA. In particular, we develop an information-theoretic bound on the generalization error of the resulting target model, which illustrates an inherent bias-variance trade-off. We then provide insights on how to balance this trade-off from three perspectives, including domain aggregation, selective pseudo-labeling, and joint feature alignment, which leads to the design of novel algorithms. Experiments on multiple datasets validate our theoretical analysis and demonstrate the state-of-art performance of the proposed algorithm, especially on some of the most challenging datasets, including Office-Home and DomainNet. △ Less

Submitted 31 May, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

Comments: ICML 2023

arXiv:2111.01635 [pdf, ps, other]

Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm

Authors: Yuheng Bu, Gholamali Aminian, Laura Toni, Miguel Rodrigues, Gregory Wornell

Abstract: We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, $α$-weighted-ERM and two-stage-ERM. Our key result is an exact characterization of the generalization behaviour using the conditional symmetrized KL information between the output hypothesis and the target training samples g… ▽ More We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, $α$-weighted-ERM and two-stage-ERM. Our key result is an exact characterization of the generalization behaviour using the conditional symmetrized KL information between the output hypothesis and the target training samples given the source samples. Our results can also be applied to provide novel distribution-free generalization error upper bounds on these two aforementioned Gibbs algorithms. Our approach is versatile, as it also characterizes the generalization errors and excess risks of these two Gibbs algorithms in the asymptotic regime, where they converge to the $α$-weighted-ERM and two-stage-ERM, respectively. Based on our theoretical results, we show that the benefits of transfer learning can be viewed as a bias-variance trade-off, with the bias induced by the source distribution and the variance induced by the lack of target samples. We believe this viewpoint can guide the choice of transfer learning algorithms in practice. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2110.15403 [pdf, other]

Selective Regression Under Fairness Criteria

Authors: Abhin Shah, Yuheng Bu, Joshua Ka-Wing Lee, Subhro Das, Rameswar Panda, Prasanna Sattigeri, Gregory W. Wornell

Abstract: Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we redu… ▽ More Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, than the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets. △ Less

Submitted 14 July, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

arXiv:2110.15397 [pdf, ps, other]

A Computationally Efficient Method for Learning Exponential Family Distributions

Authors: Abhin Shah, Devavrat Shah, Gregory W. Wornell

Abstract: We consider the question of learning the natural parameters of a $k$ parameter minimal exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotica… ▽ More We consider the question of learning the natural parameters of a $k$ parameter minimal exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotically normal, and asymptotically efficient, evaluating it is computationally hard. In this work, we propose a computationally efficient estimator that is consistent as well as asymptotically normal under mild conditions. We provide finite sample guarantees to achieve an ($\ell_2$) error of $α$ in the parameter estimation with sample complexity $O(\mathrm{poly}(k/α))$ and computational complexity ${O}(\mathrm{poly}(k/α))$. To establish these results, we show that, at the population level, our method can be viewed as the maximum likelihood estimation of a re-parameterized distribution belonging to the same class of exponential family. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:2110.06183 [pdf, other]

Blind Modulo Analog-to-Digital Conversion of Vector Processes

Authors: Amir Weiss, Everest Huang, Or Ordentlich, Gregory W. Wornell

Abstract: In a growing number of applications, there is a need to digitize a (possibly high) number of correlated signals whose spectral characteristics are challenging for traditional analog-to-digital converters (ADCs). Examples, among others, include multiple-input multiple-output systems where the ADCs must acquire at once several signals at a very wide but sparsely and dynamically occupied bandwidth su… ▽ More In a growing number of applications, there is a need to digitize a (possibly high) number of correlated signals whose spectral characteristics are challenging for traditional analog-to-digital converters (ADCs). Examples, among others, include multiple-input multiple-output systems where the ADCs must acquire at once several signals at a very wide but sparsely and dynamically occupied bandwidth supporting diverse services. In such scenarios, the resolution requirements can be prohibitively high. As an alternative, the recently proposed modulo-ADC architecture can in principle require dramatically fewer bits in the conversion to obtain the target fidelity, but requires that spatiotemporal information be known and explicitly taken into account by the analog and digital processing in the converter, which is frequently impractical. Building on our recent work, we address this limitation and develop a blind version of the architecture that requires no such knowledge in the converter. In particular, it features an automatic modulo-level adjustment and a fully adaptive modulo-decoding mechanism, allowing it to asymptotically match the characteristics of the unknown input signal. Simulation results demonstrate the successful operation of the proposed algorithm. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2108.08937

arXiv:2108.13027 [pdf, other]

What You Can Learn by Staring at a Blank Wall

Authors: Prafull Sharma, Miika Aittala, Yoav Y. Schechner, Antonio Torralba, Gregory W. Wornell, William T. Freeman, Fredo Durand

Abstract: We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two m… ▽ More We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two moving people, or the activity of a person in the hidden scene. We train two convolutional neural networks using data collected from 20 different scenes, and achieve an accuracy of $\approx94\%$ for both tasks in unseen test environments and real-time online settings. Unlike other passive non-line-of-sight methods, the technique does not rely on known occluders or controllable light sources, and generalizes to unknown rooms with no re-calibration. We analyze the generalization and robustness of our method with both real and synthetic data, and study the effect of the scene parameters on the signal quality. △ Less

Submitted 30 August, 2021; originally announced August 2021.

arXiv:2107.13656 [pdf, ps, other]

Characterizing the Generalization Error of Gibbs Algorithm with Symmetrized KL information

Authors: Gholamali Aminian, Yuheng Bu, Laura Toni, Miguel R. D. Rodrigues, Gregory Wornell

Abstract: Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory, and various approaches have been developed. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expec… ▽ More Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory, and various approaches have been developed. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm in terms of symmetrized KL information between the input training samples and the output hypothesis. Such a result can be applied to tighten existing expected generalization error bound. Our analysis provides more insight on the fundamental role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: The first and second author have contributed equally to the paper. This paper is accepted in the ICML-21 Workshop on Information-Theoretic Methods for Rigorous, Responsible, and Reliable Machine Learning: https://sites.google.com/view/itr3/schedule

arXiv:2012.15259 [pdf, other]

A Maximal Correlation Approach to Imposing Fairness in Machine Learning

Authors: Joshua Lee, Yuheng Bu, Prasanna Sattigeri, Rameswar Panda, Gregory Wornell, Leonid Karlinsky, Rogerio Feris

Abstract: As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant. We explore the problem of algorithmic fairness, taking an information-theoretic view. The maximal correlation framework is introduced for expressing fairness constraints and shown to be capable of being used to derive regularizer… ▽ More As machine learning algorithms grow in popularity and diversify to many industries, ethical and legal concerns regarding their fairness have become increasingly relevant. We explore the problem of algorithmic fairness, taking an information-theoretic view. The maximal correlation framework is introduced for expressing fairness constraints and shown to be capable of being used to derive regularizers that enforce independence and separation-based fairness criteria, which admit optimization algorithms for both discrete and continuous variables which are more computationally efficient than existing algorithms. We show that these algorithms provide smooth performance-fairness tradeoff curves and perform competitively with state-of-the-art methods on both discrete datasets (COMPAS, Adult) and continuous datasets (Communities and Crimes). △ Less

Submitted 30 December, 2020; originally announced December 2020.

Comments: 9 Pages 4 Figures

arXiv:2010.15031 [pdf, ps, other]

On Learning Continuous Pairwise Markov Random Fields

Authors: Abhin Shah, Devavrat Shah, Gregory W. Wornell

Abstract: We consider learning a sparse pairwise Markov Random Field (MRF) with continuous-valued variables from i.i.d samples. We adapt the algorithm of Vuffray et al. (2019) to this setting and provide finite-sample analysis revealing sample complexity scaling logarithmically with the number of variables, as in the discrete and Gaussian settings. Our approach is applicable to a large class of pairwise MRF… ▽ More We consider learning a sparse pairwise Markov Random Field (MRF) with continuous-valued variables from i.i.d samples. We adapt the algorithm of Vuffray et al. (2019) to this setting and provide finite-sample analysis revealing sample complexity scaling logarithmically with the number of variables, as in the discrete and Gaussian settings. Our approach is applicable to a large class of pairwise MRFs with continuous variables and also has desirable asymptotic properties, including consistency and normality under mild conditions. Further, we establish that the population version of the optimization criterion employed in Vuffray et al. (2019) can be interpreted as local maximum likelihood estimation (MLE). As part of our analysis, we introduce a robust variation of sparse linear regression a` la Lasso, which may be of interest in its own right. △ Less

Submitted 28 October, 2020; originally announced October 2020.

arXiv:1912.02314 [pdf, other]

Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization

Authors: Miika Aittala, Prafull Sharma, Lukas Murmann, Adam B. Yedidia, Gregory W. Wornell, William T. Freeman, Fredo Durand

Abstract: We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed, as any non-negative factorization will satisfy the data. Insp… ▽ More We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed, as any non-negative factorization will satisfy the data. Inspired by recent work on the Deep Image Prior, we parameterize the factor matrices using randomly initialized convolutional neural networks trained in a one-off manner, and show that this results in decompositions that reflect the true motion in the hidden scene. △ Less

Submitted 4 December, 2019; originally announced December 2019.

Comments: 14 pages, 5 figures, Advances in Neural Information Processing Systems 2019

Journal ref: Aittala, Miika, et al. "Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization." Advances in Neural Information Processing Systems. 2019

arXiv:1911.09105 [pdf, other]

On Universal Features for High-Dimensional Learning and Inference

Authors: Shao-Lun Huang, Anuran Makur, Gregory W. Wornell, Lizhong Zheng

Abstract: We consider the problem of identifying universal low-dimensional features from high-dimensional data for inference tasks in settings involving learning. For such problems, we introduce natural notions of universality and we show a local equivalence among them. Our analysis is naturally expressed via information geometry, and represents a conceptually and computationally useful analysis. The develo… ▽ More We consider the problem of identifying universal low-dimensional features from high-dimensional data for inference tasks in settings involving learning. For such problems, we introduce natural notions of universality and we show a local equivalence among them. Our analysis is naturally expressed via information geometry, and represents a conceptually and computationally useful analysis. The development reveals the complementary roles of the singular value decomposition, Hirschfeld-Gebelein-Rényi maximal correlation, the canonical correlation and principle component analyses of Hotelling and Pearson, Tishby's information bottleneck, Wyner's common information, Ky Fan $k$-norms, and Brieman and Friedman's alternating conditional expectations algorithm. We further illustrate how this framework facilitates understanding and optimizing aspects of learning systems, including multinomial logistic (softmax) regression and the associated neural network architecture, matrix factorization methods for collaborative filtering and other applications, rank-constrained multivariate linear regression, and forms of semi-supervised learning. △ Less

Submitted 20 November, 2019; originally announced November 2019.

arXiv:1911.08034 [pdf, other]

Super-Nyquist Rateless Coding for Intersymbol Interference Channels

Authors: Uri Erez, Gregory W. Wornell

Abstract: A rateless transmission architecture is developed for communication over Gaussian intersymbol interference channels, based on the concept of super-Nyquist (SNQ) signaling. In such systems, the signaling rate is chosen significantly higher than the Nyquist rate of the system. We show that such signaling, when used in conjunction with good "off-the-shelf" base codes, simple linear redundancy, and mi… ▽ More A rateless transmission architecture is developed for communication over Gaussian intersymbol interference channels, based on the concept of super-Nyquist (SNQ) signaling. In such systems, the signaling rate is chosen significantly higher than the Nyquist rate of the system. We show that such signaling, when used in conjunction with good "off-the-shelf" base codes, simple linear redundancy, and minimum mean-square error decision feedback equalization, results in capacity-approaching, low-complexity rateless codes for the time-varying intersymbol-interference channel. Constructions for both single-input / single-output (SISO) and multi-input / multi-output (MIMO) ISI channels are developed. △ Less

Submitted 18 November, 2019; originally announced November 2019.

arXiv:1905.06600 [pdf, other]

An Information Theoretic Interpretation to Deep Neural Networks

Authors: Shao-Lun Huang, Xiangxiang Xu, Lizhong Zheng, Gregory W. Wornell

Abstract: It is commonly believed that the hidden layers of deep neural networks (DNNs) attempt to extract informative features for learning tasks. In this paper, we formalize this intuition by showing that the features extracted by DNN coincide with the result of an optimization problem, which we call the `universal feature selection' problem, in a local analysis regime. We interpret the weights training i… ▽ More It is commonly believed that the hidden layers of deep neural networks (DNNs) attempt to extract informative features for learning tasks. In this paper, we formalize this intuition by showing that the features extracted by DNN coincide with the result of an optimization problem, which we call the `universal feature selection' problem, in a local analysis regime. We interpret the weights training in DNN as the projection of feature functions between feature spaces, specified by the network structure. Our formulation has direct operational meaning in terms of the performance for inference tasks, and gives interpretations to the internal computation results of DNNs. Results of numerical experiments are provided to support the analysis. △ Less

Submitted 16 May, 2019; originally announced May 2019.

Comments: Accepted to ISIT 2019

arXiv:1811.05443 [pdf, other]

Co-regularized Alignment for Unsupervised Domain Adaptation

Authors: Abhishek Kumar, Prasanna Sattigeri, Kahini Wadhawan, Leonid Karlinsky, Rogerio Feris, William T. Freeman, Gregory Wornell

Abstract: Deep neural networks, trained with large amount of labeled data, can fail to generalize well when tested with examples from a \emph{target domain} whose distribution differs from the training data distribution, referred as the \emph{source domain}. It can be expensive or even infeasible to obtain required amount of labeled data in all possible domains. Unsupervised domain adaptation sets out to ad… ▽ More Deep neural networks, trained with large amount of labeled data, can fail to generalize well when tested with examples from a \emph{target domain} whose distribution differs from the training data distribution, referred as the \emph{source domain}. It can be expensive or even infeasible to obtain required amount of labeled data in all possible domains. Unsupervised domain adaptation sets out to address this problem, aiming to learn a good predictive model for the target domain using labeled examples from the source domain but only unlabeled examples from the target domain. Domain alignment approaches this problem by matching the source and target feature distributions, and has been used as a key component in many state-of-the-art domain adaptation methods. However, matching the marginal feature distributions does not guarantee that the corresponding class conditional distributions will be aligned across the two domains. We propose co-regularized domain alignment for unsupervised domain adaptation, which constructs multiple diverse feature spaces and aligns source and target distributions in each of them individually, while encouraging that alignments agree with each other with regard to the class predictions on the unlabeled target examples. The proposed method is generic and can be used to improve any domain adaptation method which uses domain alignment. We instantiate it in the context of a recent state-of-the-art method and observe that it provides significant performance improvements on several domain adaptation benchmarks. △ Less

Submitted 13 November, 2018; originally announced November 2018.

Comments: NIPS 2018 accepted version

arXiv:1810.07014 [pdf, other]

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss

Authors: Amichai Painsky, Gregory W. Wornell

Abstract: A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullb… ▽ More A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant. This implies that by minimizing the logarithmic loss associated with the KL divergence, we minimize an upper bound to any choice of loss from this set. As such the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures. Importantly, this notion of universality is not problem-specific, enabling its use in diverse applications, including predictive modeling, data clustering and sample complexity analysis. Generalizations to arbitrary finite alphabets are also developed. The derived inequalities extend several well-known $f$-divergence results. △ Less

Submitted 2 January, 2020; v1 submitted 14 October, 2018; originally announced October 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1805.03804

arXiv:1806.08968 [pdf, ps, other]

doi 10.1109/JSTSP.2018.2863189

A Modulo-Based Architecture for Analog-to-Digital Conversion

Authors: Or Ordentlich, Gizem Tabak, Pavan Kumar Hanumolu, Andrew C. Singer, Gregory W. Wornell

Abstract: Systems that capture and process analog signals must first acquire them through an analog-to-digital converter. While subsequent digital processing can remove statistical correlations present in the acquired data, the dynamic range of the converter is typically scaled to match that of the input analog signal. The present paper develops an approach for analog-to-digital conversion that aims at mini… ▽ More Systems that capture and process analog signals must first acquire them through an analog-to-digital converter. While subsequent digital processing can remove statistical correlations present in the acquired data, the dynamic range of the converter is typically scaled to match that of the input analog signal. The present paper develops an approach for analog-to-digital conversion that aims at minimizing the number of bits per sample at the output of the converter. This is attained by reducing the dynamic range of the analog signal by performing a modulo operation on its amplitude, and then quantizing the result. While the converter itself is universal and agnostic of the statistics of the signal, the decoder operation on the output of the quantizer can exploit the statistical structure in order to unwrap the modulo folding. The performance of this method is shown to approach information theoretical limits, as captured by the rate-distortion function, in various settings. An architecture for modulo analog-to-digital conversion via ring oscillators is suggested, and its merits are numerically demonstrated. △ Less

Submitted 23 June, 2018; originally announced June 2018.

arXiv:1805.03804 [pdf, other]

On the Universality of the Logistic Loss Function

Authors: Amichai Painsky, Gregory W. Wornell

Abstract: A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and th… ▽ More A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality. △ Less

Submitted 10 May, 2018; originally announced May 2018.

arXiv:1708.02501 [pdf, other]

Covert Communication with Channel-State Information at the Transmitter

Authors: Si-Hyeon Lee, Ligong Wang, Ashish Khisti, Gregory W. Wornell

Abstract: We consider the problem of covert communication over a state-dependent channel, where the transmitter has causal or noncausal knowledge of the channel states. Here, "covert" means that a warden on the channel should observe similar statistics when the transmitter is sending a message and when it is not. When a sufficiently long secret key is shared between the transmitter and the receiver, we deri… ▽ More We consider the problem of covert communication over a state-dependent channel, where the transmitter has causal or noncausal knowledge of the channel states. Here, "covert" means that a warden on the channel should observe similar statistics when the transmitter is sending a message and when it is not. When a sufficiently long secret key is shared between the transmitter and the receiver, we derive closed-form formulas for the maximum achievable covert communication rate ("covert capacity") for discrete memoryless channels and, when the transmitter's channel-state information (CSI) is noncausal, for additive white Gaussian noise (AWGN) channels. For certain channel models, including the AWGN channel, we show that the covert capacity is positive with CSI at the transmitter, but is zero without CSI. We also derive lower bounds on the rate of the secret key that is needed for the transmitter and the receiver to achieve the covert capacity. △ Less

Submitted 8 August, 2017; originally announced August 2017.

Comments: 20 pages, 3 figures, a shorter version presented at IEEE ISIT 2017, submitted to IEEE Transactions on Information Forensics and Security

arXiv:1706.09387 [pdf, other]

Asynchronous Massive Access and Neighbor Discovery Using OFDMA

Authors: Xu Chen, Lina Liu, Dongning Guo, Gregory W. Wornell

Abstract: The fundamental communication problem in the wireless Internet of Things (IoT) is to discover a massive number of devices and to allow them reliable access to shared channels. Oftentimes these devices transmit short messages randomly and sporadically. This paper proposes a novel signaling scheme for grant-free massive access, where each device encodes its identity and/or information in a sparse se… ▽ More The fundamental communication problem in the wireless Internet of Things (IoT) is to discover a massive number of devices and to allow them reliable access to shared channels. Oftentimes these devices transmit short messages randomly and sporadically. This paper proposes a novel signaling scheme for grant-free massive access, where each device encodes its identity and/or information in a sparse set of tones. Such transmissions are implemented in the form of orthogonal frequency-division multiple access (OFDMA). Under some mild conditions and assuming device delays to be bounded unknown multiples of symbol intervals, sparse OFDMA is proved to enable arbitrarily reliable asynchronous device identification and message decoding with a codelength that is O(K(log K + log S + log N)), where N denotes the device population, K denotes the actual number of active devices, and log S is essentially equal to the number of bits a device can send (including its identity). By exploiting the Fast Fourier Transform (FFT), the computational complexity for discovery and decoding can be made to be sub-linear in the total device population. To prove the concept, a specific design is proposed to identify up to 100 active devices out of $2^{38}$ possible devices with up to 20 symbols of delay and moderate signal-to-noise ratios and fading. The codelength compares much more favorably with those of standard slotted ALOHA and carrier-sensing multiple access (CSMA) schemes. △ Less

Submitted 19 November, 2021; v1 submitted 28 June, 2017; originally announced June 2017.

arXiv:1705.06616 [pdf, other]

Sensor Array Design Through Submodular Optimization

Authors: Gal Shulkind, Stefanie Jegelka, Gregory W. Wornell

Abstract: We consider the problem of far-field sensing by means of a sensor array. Traditional array geometry design techniques are agnostic to prior information about the far-field scene. However, in many applications such priors are available and may be utilized to design more efficient array topologies. We formulate the problem of array geometry design with scene prior as one of finding a sampling config… ▽ More We consider the problem of far-field sensing by means of a sensor array. Traditional array geometry design techniques are agnostic to prior information about the far-field scene. However, in many applications such priors are available and may be utilized to design more efficient array topologies. We formulate the problem of array geometry design with scene prior as one of finding a sampling configuration that enables efficient inference, which turns out to be a combinatorial optimization problem. While generic combinatorial optimization problems are NP-hard and resist efficient solvers, we show how for array design problems the theory of submodular optimization may be utilized to obtain efficient algorithms that are guaranteed to achieve solutions within a constant approximation factor from the optimum. We leverage the connection between array design problems and submodular optimization and port several results of interest. We demonstrate efficient methods for designing arrays with constraints on the sensing aperture, as well as arrays respecting combinatorial placement constraints. This novel connection between array design and submodularity suggests the possibility for utilizing other insights and techniques from the growing body of literature on submodular optimization in the field of array design. △ Less

Submitted 28 December, 2017; v1 submitted 18 May, 2017; originally announced May 2017.

arXiv:1610.02578 [pdf, other]

Defect tolerance: fundamental limits and examples

Authors: Jennifer Tang, Da Wang, Yury Polyanskiy, Gregory Wornell

Abstract: This paper addresses the problem of adding redundancy to a collection of physical objects so that the overall system is more robust to failures. In contrast to its information counterpart, which can exploit parity to protect multiple information symbols from a single erasure, physical redundancy can only be realized through duplication and substitution of objects. We propose a bipartite graph mode… ▽ More This paper addresses the problem of adding redundancy to a collection of physical objects so that the overall system is more robust to failures. In contrast to its information counterpart, which can exploit parity to protect multiple information symbols from a single erasure, physical redundancy can only be realized through duplication and substitution of objects. We propose a bipartite graph model for designing defect-tolerant systems in which defective objects are replaced by judiciously connected redundant objects. The fundamental limits of this model are characterized under various asymptotic settings and both asymptotic and finite-size systems that approach these limits are constructed. Among other results, we show that simple modular redundancy is in general suboptimal. As we develop, this combinatorial problem of defect tolerant system design has a natural interpretation as one of graph coloring, and the analysis is significantly different from that traditionally used in information redundancy for error-control codes. △ Less

Submitted 7 November, 2017; v1 submitted 8 October, 2016; originally announced October 2016.

arXiv:1511.08143 [pdf, other]

On Throughput-Smoothness Trade-offs in Streaming Communication

Authors: Gauri Joshi, Yuval Kochman, Gregory Wornell

Abstract: Unlike traditional file transfer where only total delay matters, streaming applications impose delay constraints on each packet and require them to be in order. To achieve fast in-order packet decoding, we have to compromise on the throughput. We study this trade-off between throughput and smoothness in packet decoding. We first consider a point-to-point streaming and analyze how the trade-off is… ▽ More Unlike traditional file transfer where only total delay matters, streaming applications impose delay constraints on each packet and require them to be in order. To achieve fast in-order packet decoding, we have to compromise on the throughput. We study this trade-off between throughput and smoothness in packet decoding. We first consider a point-to-point streaming and analyze how the trade-off is affected by the frequency of block-wise feedback, whereby the source receives full channel state feedback at periodic intervals. We show that frequent feedback can drastically improve the throughput-smoothness trade-off. Then we consider the problem of multicasting a packet stream to two users. For both point-to-point and multicast streaming, we propose a spectrum of coding schemes that span different throughput-smoothness tradeoffs. One can choose an appropriate coding scheme from these, depending upon the delay-sensitivity and bandwidth limitations of the application. This work introduces a novel style of analysis using renewal processes and Markov chains to analyze coding schemes. △ Less

Submitted 25 November, 2015; originally announced November 2015.

arXiv:1510.04731 [pdf, other]

Efficient Replication of Queued Tasks for Latency Reduction in Cloud Systems

Authors: Gauri Joshi, Emina Soljanin, Gregory Wornell

Abstract: In cloud computing systems, assigning a job to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers. Although adding redundant replicas always reduces service time, the total computing time spent per job may be higher, thus increasing waiting time in queue. The total time spent per job is also proportio… ▽ More In cloud computing systems, assigning a job to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers. Although adding redundant replicas always reduces service time, the total computing time spent per job may be higher, thus increasing waiting time in queue. The total time spent per job is also proportional to the cost of computing resources. We analyze how different redundancy strategies, for eg. number of replicas, and the time when they are issued and canceled, affect the latency and computing cost. We get the insight that the log-concavity of the service time distribution is a key factor in determining whether adding redundancy reduces latency and cost. If the service distribution is log-convex, then adding maximum redundancy reduces both latency and cost. And if it is log-concave, then having fewer replicas and canceling the redundant requests early is more effective. △ Less

Submitted 15 October, 2015; originally announced October 2015.

Comments: presented at Allerton 2015. arXiv admin note: substantial text overlap with arXiv:1508.03599

arXiv:1508.03599 [pdf, other]

Efficient Redundancy Techniques for Latency Reduction in Cloud Systems

Authors: Gauri Joshi, Emina Soljanin, Gregory Wornell

Abstract: In cloud computing systems, assigning a task to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers, and reduce latency. But adding redundancy may result in higher cost of computing resources, as well as an increase in queueing delay due to higher traffic load. This work helps understand when and how r… ▽ More In cloud computing systems, assigning a task to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers, and reduce latency. But adding redundancy may result in higher cost of computing resources, as well as an increase in queueing delay due to higher traffic load. This work helps understand when and how redundancy gives a cost-efficient reduction in latency. For a general task service time distribution, we compare different redundancy strategies in terms of the number of redundant tasks, and time when they are issued and canceled. We get the insight that the log-concavity of the task service time creates a dichotomy of when adding redundancy helps. If the service time distribution is log-convex (i.e. log of the tail probability is convex) then adding maximum redundancy reduces both latency and cost. And if it is log-concave (i.e. log of the tail probability is concave), then less redundancy, and early cancellation of redundant tasks is more effective. Using these insights, we design a general redundancy strategy that achieves a good latency-cost trade-off for an arbitrary service time distribution. This work also generalizes and extends some results in the analysis of fork-join queues. △ Less

Submitted 12 April, 2017; v1 submitted 14 August, 2015; originally announced August 2015.

Comments: accepted for publication in ACM Transactions on Modeling and Performance Evaluation of Computing Systems

arXiv:1506.03236 [pdf, other]

doi 10.1109/TIT.2016.2548471

Fundamental Limits of Communication with Low Probability of Detection

Authors: Ligong Wang, Gregory Wornell, Lizhong Zheng

Abstract: This paper considers the problem of communication over a discrete memoryless channel (DMC) or an additive white Gaussian noise (AWGN) channel subject to the constraint that the probability that an adversary who observes the channel outputs can detect the communication is low. Specifically, the relative entropy between the output distributions when a codeword is transmitted and when no input is pro… ▽ More This paper considers the problem of communication over a discrete memoryless channel (DMC) or an additive white Gaussian noise (AWGN) channel subject to the constraint that the probability that an adversary who observes the channel outputs can detect the communication is low. Specifically, the relative entropy between the output distributions when a codeword is transmitted and when no input is provided to the channel must be sufficiently small. For a DMC whose output distribution induced by the "off" input symbol is not a mixture of the output distributions induced by other input symbols, it is shown that the maximum amount of information that can be transmitted under this criterion scales like the square root of the blocklength. The same is true for the AWGN channel. Exact expressions for the scaling constant are also derived. △ Less

Submitted 21 April, 2016; v1 submitted 10 June, 2015; originally announced June 2015.

Comments: Version to appear in IEEE Transactions on Information Theory; minor typos in v2 corrected. Part of this work was presented at ISIT 2015 in Hong Kong

arXiv:1503.03128 [pdf, other]

Efficient Straggler Replication in Large-scale Parallel Computing

Authors: Da Wang, Gauri Joshi, Gregory Wornell

Abstract: In a cloud computing job with many parallel tasks, the tasks on the slowest machines (straggling tasks) become the bottleneck in the job completion. Computing frameworks such as MapReduce and Spark tackle this by replicating the straggling tasks and waiting for any one copy to finish. Despite being adopted in practice, there is little analysis of how replication affects the latency and the cost of… ▽ More In a cloud computing job with many parallel tasks, the tasks on the slowest machines (straggling tasks) become the bottleneck in the job completion. Computing frameworks such as MapReduce and Spark tackle this by replicating the straggling tasks and waiting for any one copy to finish. Despite being adopted in practice, there is little analysis of how replication affects the latency and the cost of additional computing resources. In this paper we provide a framework to analyze this latency-cost trade-off and find the best replication strategy by answering design questions such as: 1) when to replicate straggling tasks, 2) how many replicas to launch, and 3) whether to kill the original copy or not. Our analysis reveals that for certain execution time distributions, a small amount of task replication can drastically reduce both latency as well as the cost of computing resources. We also propose an algorithm to estimate the latency and cost based on the empirical distribution of task execution time. Evaluations using samples in the Google Cluster Trace suggest further latency and cost reduction compared to the existing replication strategy used in MapReduce. △ Less

Submitted 12 September, 2017; v1 submitted 10 March, 2015; originally announced March 2015.

Comments: Submitted to ACM Transactions on Modeling and Performance Evaluation of Computing Systems

ACM Class: C.4; F.2.2

arXiv:1407.5883 [pdf, ps, other]

doi 10.1109/TIT.2014.2331060

Toward Photon-Efficient Key Distribution over Optical Channels

Authors: Yuval Kochman, Ligong Wang, Gregory W. Wornell

Abstract: This work considers the distribution of a secret key over an optical (bosonic) channel in the regime of high photon efficiency, i.e., when the number of secret key bits generated per detected photon is high. While in principle the photon efficiency is unbounded, there is an inherent tradeoff between this efficiency and the key generation rate (with respect to the channel bandwidth). We derive asym… ▽ More This work considers the distribution of a secret key over an optical (bosonic) channel in the regime of high photon efficiency, i.e., when the number of secret key bits generated per detected photon is high. While in principle the photon efficiency is unbounded, there is an inherent tradeoff between this efficiency and the key generation rate (with respect to the channel bandwidth). We derive asymptotic expressions for the optimal generation rates in the photon-efficient limit, and propose schemes that approach these limits up to certain approximations. The schemes are practical, in the sense that they use coherent or temporally-entangled optical states and direct photodetection, all of which are reasonably easy to realize in practice, in conjunction with off-the-shelf classical codes. △ Less

Submitted 22 July, 2014; originally announced July 2014.

Comments: In IEEE Transactions on Information Theory; same version except that labels are corrected for Schemes S-1, S-2, and S-3, which appear as S-3, S-4, and S-5 in the Transactions

Journal ref: IEEE Transactions on Information Theory, Vol. 60, No. 8, pp. 4958 - 4972, Aug. 2014

arXiv:1406.7435 [pdf, other]

doi 10.1109/TIT.2015.2485270

Compression in the Space of Permutations

Authors: Da Wang, Arya Mazumdar, Gregory Wornell

Abstract: We investigate lossy compression (source coding) of data in the form of permutations. This problem has direct applications in the storage of ordinal data or rankings, and in the analysis of sorting algorithms. We analyze the rate-distortion characteristic for the permutation space under the uniform distribution, and the minimum achievable rate of compression that allows a bounded distortion after… ▽ More We investigate lossy compression (source coding) of data in the form of permutations. This problem has direct applications in the storage of ordinal data or rankings, and in the analysis of sorting algorithms. We analyze the rate-distortion characteristic for the permutation space under the uniform distribution, and the minimum achievable rate of compression that allows a bounded distortion after recovery. Our analysis is with respect to different practical and useful distortion measures, including Kendall-tau distance, Spearman's footrule, Chebyshev distance and inversion-$\ell_1$ distance. We establish equivalence of source code designs under certain distortions and show simple explicit code designs that incur low encoding/decoding complexities and are asymptotically optimal. Finally, we show that for the Mallows model, a popular nonuniform ranking model on the permutation space, both the entropy and the maximum distortion at zero rate are much lower than the uniform counterparts, which motivates the future design of efficient compression schemes for this model. △ Less

Submitted 17 October, 2015; v1 submitted 28 June, 2014; originally announced June 2014.

Comments: accepted to IEEE Transaction on Information Theory

arXiv:1405.3697 [pdf, other]

Throughput-Smoothness Trade-offs in Multicasting of an Ordered Packet Stream

Authors: Gauri Joshi, Yuval Kochman, Gregory Wornell

Abstract: An increasing number of streaming applications need packets to be strictly in-order at the receiver. This paper provides a framework for analyzing in-order packet delivery in such applications. We consider the problem of multicasting an ordered stream of packets to two users over independent erasure channels with instantaneous feedback to the source. Depending upon the channel erasures, a packet w… ▽ More An increasing number of streaming applications need packets to be strictly in-order at the receiver. This paper provides a framework for analyzing in-order packet delivery in such applications. We consider the problem of multicasting an ordered stream of packets to two users over independent erasure channels with instantaneous feedback to the source. Depending upon the channel erasures, a packet which is in-order for one user, may be redundant for the other. Thus there is an inter-dependence between throughput and the smoothness of in-order packet delivery to the two users. We use a Markov chain model of packet decoding to analyze these throughput-smoothness trade-offs of the users, and propose coding schemes that can span different points on each trade-off. △ Less

Submitted 14 May, 2014; originally announced May 2014.

Comments: Accepted to NetCod 2014

Showing 1–50 of 77 results for author: Wornell, G