Skip to main content

Showing 1–50 of 125 results for author: Baraniuk, R G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13781  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    A Primal-Dual Framework for Transformers and Neural Networks

    Authors: Tan M. Nguyen, Tam Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

    Abstract: Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresp… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted to ICLR 2023, 26 pages, 4 figures, 14 tables

  2. arXiv:2405.13977  [pdf, other

    cs.LG stat.ML

    Removing Bias from Maximum Likelihood Estimation with Model Autophagy

    Authors: Paul Mayer, Lorenzo Luzi, Ali Siahkoohi, Don H. Johnson, Richard G. Baraniuk

    Abstract: We propose autophagy penalized likelihood estimation (PLE), an unbiased alternative to maximum likelihood estimation (MLE) which is more fair and less susceptible to model autophagy disorder (madness). Model autophagy refers to models trained on their own output; PLE ensures the statistics of these outputs coincide with the data statistics. This enables PLE to be statistically unbiased in certain… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 9 Pages, submission for NeurIPS 2024

    MSC Class: 68T07

  3. arXiv:2405.08134  [pdf, other

    cs.CL

    Many-Shot Regurgitation (MSR) Prompting

    Authors: Shashank Sonkar, Richard G. Baraniuk

    Abstract: We introduce Many-Shot Regurgitation (MSR) prompting, a new black-box membership inference attack framework for examining verbatim content reproduction in large language models (LLMs). MSR prompting involves dividing the input text into multiple segments and creating a single prompt that includes a series of faux conversation rounds between a user and a language model to elicit verbatim regurgitat… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  4. arXiv:2404.15156  [pdf, other

    cs.CL

    Regressive Side Effects of Training Language Models to Mimic Student Misconceptions

    Authors: Shashank Sonkar, Naiming Liu, Richard G. Baraniuk

    Abstract: This paper presents a novel exploration into the regressive side effects of training Large Language Models (LLMs) to mimic student misconceptions for personalized education. We highlight the problem that as LLMs are trained to more accurately mimic student misconceptions, there is a compromise in the factual integrity and reasoning ability of the models. Our work involved training an LLM on a stud… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  5. arXiv:2404.14316  [pdf, other

    cs.CL

    Automated Long Answer Grading with RiceChem Dataset

    Authors: Shashank Sonkar, Kangqi Ni, Lesa Tran Lu, Kristi Kincaid, John S. Hutchinson, Richard G. Baraniuk

    Abstract: We introduce a new area of study in the field of educational Natural Language Processing: Automated Long Answer Grading (ALAG). Distinguishing itself from Automated Short Answer Grading (ASAG) and Automated Essay Grading (AEG), ALAG presents unique challenges due to the complexity and multifaceted nature of fact-based long answers. To study ALAG, we introduce RiceChem, a dataset derived from a col… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  6. arXiv:2404.14301  [pdf, other

    cs.CL

    Marking: Visual Grading with Highlighting Errors and Annotating Missing Bits

    Authors: Shashank Sonkar, Naiming Liu, Debshila B. Mallick, Richard G. Baraniuk

    Abstract: In this paper, we introduce "Marking", a novel grading task that enhances automated grading systems by performing an in-depth analysis of student responses and providing students with visual highlights. Unlike traditional systems that provide binary scores, "marking" identifies and categorizes segments of the student response as correct, incorrect, or irrelevant and detects omissions from gold ans… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  7. arXiv:2402.15989  [pdf, other

    cs.AI eess.SY

    PIDformer: Transformer Meets Control Theory

    Authors: Tam Nguyen, César A. Uribe, Tan M. Nguyen, Richard G. Baraniuk

    Abstract: In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its solutions, leading to lower-rank outputs and diminished representation capacity. Moreover, the steady-state solution of the model is sensitive to input p… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  8. arXiv:2402.05000  [pdf, other

    cs.CL

    Pedagogical Alignment of Large Language Models

    Authors: Shashank Sonkar, Kangqi Ni, Sapana Chaudhary, Richard G. Baraniuk

    Abstract: In this paper, we introduce the novel concept of pedagogically aligned Large Language Models (LLMs) that signifies a transformative shift in the application of LLMs within educational contexts. Rather than providing direct responses to user queries, pedagogically-aligned LLMs function as scaffolding tools, breaking complex problems into manageable subproblems and guiding students towards the final… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  9. arXiv:2401.14429  [pdf, ps, other

    cs.LG cs.RO eess.SP stat.ML

    [Re] The Discriminative Kalman Filter for Bayesian Filtering with Nonlinear and Non-Gaussian Observation Models

    Authors: Josue Casco-Rodriguez, Caleb Kemere, Richard G. Baraniuk

    Abstract: Kalman filters provide a straightforward and interpretable means to estimate hidden or latent variables, and have found numerous applications in control, robotics, signal processing, and machine learning. One such application is neural decoding for neuroprostheses. In 2020, Burkhart et al. thoroughly evaluated their new version of the Kalman filter that leverages Bayes' theorem to improve filter p… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  10. arXiv:2312.00751  [pdf, other

    cs.CL cs.AI

    Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

    Authors: Tam Nguyen, Tan M. Nguyen, Richard G. Baraniuk

    Abstract: Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in which the token representations become identical when the model's depth grows. In this work, we show that self-attention layers in transformers minimize a functi… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 24 papes

  11. arXiv:2310.02439  [pdf, other

    cs.CL

    Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions

    Authors: Naiming Liu, Shashank Sonkar, Zichao Wang, Simon Woodhead, Richard G. Baraniuk

    Abstract: We propose novel evaluations for mathematical reasoning capabilities of Large Language Models (LLMs) based on mathematical misconceptions. Our primary approach is to simulate LLMs as a novice learner and an expert tutor, aiming to identify the incorrect answer to math question resulted from a specific misconception and to recognize the misconception(s) behind an incorrect answer, respectively. Con… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  12. arXiv:2310.00545  [pdf, other

    eess.SP cs.CV eess.IV

    Implicit Neural Representations and the Algebra of Complex Wavelets

    Authors: T. Mitchell Roddenberry, Vishwanath Saragadam, Maarten V. de Hoop, Richard G. Baraniuk

    Abstract: Implicit neural representations (INRs) have arisen as useful methods for representing signals on Euclidean domains. By parameterizing an image as a multilayer perceptron (MLP) on Euclidean space, INRs effectively represent signals in a way that couples spatial and spectral features of the signal that is not obvious in the usual discrete representation, paving the way for continuous signal processi… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: 10 pages, 6 figures. 2 appendix pages, 1 appendix figure

  13. arXiv:2309.12161  [pdf, other

    cs.CL

    Code Soliloquies for Accurate Calculations in Large Language Models

    Authors: Shashank Sonkar, MyCo Le, Xinghe Chen, Naiming Liu, Debshila Basu Mallick, Richard G. Baraniuk

    Abstract: High-quality conversational datasets are crucial for the successful development of Intelligent Tutoring Systems (ITS) that utilize a Large Language Model (LLM) backend. Synthetic student-teacher dialogues, generated using advanced GPT-4 models, are a common strategy for creating these datasets. However, subjects like physics that entail complex calculations pose a challenge. While GPT-4 presents i… ▽ More

    Submitted 31 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

  14. arXiv:2307.01850  [pdf, other

    cs.LG cs.AI cs.CV

    Self-Consuming Generative Models Go MAD

    Authors: Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk

    Abstract: Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of au… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 31 pages, 31 figures, pre-print

  15. arXiv:2305.14507  [pdf, other

    cs.CL

    Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models

    Authors: Shashank Sonkar, Richard G. Baraniuk

    Abstract: We explore whether Large Language Models (LLMs) are capable of logical reasoning with distorted facts, which we call Deduction under Perturbed Evidence (DUPE). DUPE presents a unique challenge to LLMs since they typically rely on their parameters, which encode mostly accurate information, to reason and make inferences. However, in DUPE, LLMs must reason over manipulated or falsified evidence prese… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  16. arXiv:2305.13297  [pdf, other

    cs.CL

    Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design

    Authors: Shashank Sonkar, Richard G. Baraniuk

    Abstract: This paper investigates the key role of Feed-Forward Networks (FFNs) in transformer models by utilizing the Parallel Attention and Feed-Forward Net Design (PAF) architecture, and comparing it to their Series Attention and Feed-Forward Net Design (SAF) counterparts. Central to the effectiveness of PAF are two main assumptions regarding the FFN block and the attention block within a layer: 1) the pr… ▽ More

    Submitted 25 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  17. arXiv:2305.13272  [pdf, other

    cs.CL

    CLASS: A Design Framework for building Intelligent Tutoring Systems based on Learning Science principles

    Authors: Shashank Sonkar, Naiming Liu, Debshila Basu Mallick, Richard G. Baraniuk

    Abstract: We present a design framework called Conversational Learning with Analytical Step-by-Step Strategies (CLASS) for building advanced Intelligent Tutoring Systems (ITS) powered by high-performance Large Language Models (LLMs). The CLASS framework empowers ITS with two key capabilities. First, through a carefully curated scaffolding dataset, CLASS equips ITS with essential problem-solving strategies,… ▽ More

    Submitted 25 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Paper accepted at EMNLP 2023

  18. arXiv:2301.05187  [pdf, other

    cs.CV cs.GR eess.IV

    WIRE: Wavelet Implicit Neural Representations

    Authors: Vishwanath Saragadam, Daniel LeJeune, Jasper Tan, Guha Balakrishnan, Ashok Veeraraghavan, Richard G. Baraniuk

    Abstract: Implicit neural representations (INRs) have recently advanced numerous vision-related areas. INR performance depends strongly on the choice of the nonlinear activation function employed in its multilayer perceptron (MLP) network. A wide range of nonlinearities have been explored, but, unfortunately, current INRs designed to have high accuracy also suffer from poor robustness (to signal noise, para… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

  19. arXiv:2212.09723  [pdf, other

    cs.CL

    MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

    Authors: Shashank Sonkar, Zichao Wang, Richard G. Baraniuk

    Abstract: This paper investigates the problem of Named Entity Recognition (NER) for extreme low-resource languages with only a few hundred tagged data samples. NER is a fundamental task in Natural Language Processing (NLP). A critical driver accelerating NER systems' progress is the existence of large-scale language corpora that enable NER systems to achieve outstanding performance in languages such as Engl… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  20. arXiv:2212.06345  [pdf, other

    physics.optics cs.CV

    Foveated Thermal Computational Imaging in the Wild Using All-Silicon Meta-Optics

    Authors: Vishwanath Saragadam, Zheyi Han, Vivek Boominathan, Luocheng Huang, Shiyu Tan, Johannes E. Fröch, Karl F. Böhringer, Richard G. Baraniuk, Arka Majumdar, Ashok Veeraraghavan

    Abstract: Foveated imaging provides a better tradeoff between situational awareness (field of view) and resolution and is critical in long-wavelength infrared regimes because of the size, weight, power, and cost of thermal sensors. We demonstrate computational foveated imaging by exploiting the ability of a meta-optical frontend to discriminate between different polarization states and a computational backe… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  21. arXiv:2211.11074  [pdf, other

    cs.LG

    Frozen Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks

    Authors: Yehuda Dar, Lorenzo Luzi, Richard G. Baraniuk

    Abstract: We study the generalization behavior of transfer learning of deep neural networks (DNNs). We adopt the overparameterization perspective -- featuring interpolation of the training data (i.e., approximately zero train error) and the double descent phenomenon -- to explain the delicate effect of the transfer learning setting on generalization performance. We study how the generalization behavior of t… ▽ More

    Submitted 12 June, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

  22. arXiv:2211.03751  [pdf, other

    math.NA cs.DS math.ST

    Asymptotics of the Sketched Pseudoinverse

    Authors: Daniel LeJeune, Pratik Patil, Hamid Javadi, Richard G. Baraniuk, Ryan J. Tibshirani

    Abstract: We take a random matrix theory approach to random sketching and show an asymptotic first-order equivalence of the regularized sketched pseudoinverse of a positive semidefinite matrix to a certain evaluation of the resolvent of the same matrix. We focus on real-valued regularization and extend previous results on an asymptotic equivalence of random matrices to the real setting, providing a precise… ▽ More

    Submitted 6 October, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: 45 pages, 9 figures

    MSC Class: 15B52; 46L54; 62J07

  23. arXiv:2210.12565  [pdf, other

    cs.CL cs.LG

    A Visual Tour Of Current Challenges In Multimodal Language Models

    Authors: Shashank Sonkar, Naiming Liu, Richard G. Baraniuk

    Abstract: Transformer models trained on massive text corpora have become the de facto models for a wide range of natural language processing tasks. However, learning effective word representations for function words remains challenging. Multimodal learning, which visually grounds transformer models in imagery, can overcome the challenges to some extent; however, there is still much work to be done. In this… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

  24. arXiv:2210.12100  [pdf, other

    cs.CV cs.LG stat.ML

    Boomerang: Local sampling on image manifolds using diffusion models

    Authors: Lorenzo Luzi, Paul M Mayer, Josue Casco-Rodriguez, Ali Siahkoohi, Richard G. Baraniuk

    Abstract: The inference stage of diffusion models can be seen as running a reverse-time diffusion stochastic differential equation, where samples from a Gaussian latent distribution are transformed into samples from a target distribution that usually reside on a low-dimensional manifold, e.g., an image manifold. The intermediate values between the initial latent space and the image manifold can be interpret… ▽ More

    Submitted 17 April, 2024; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: Published in Transactions on Machine Learning Research

  25. arXiv:2209.14778  [pdf, other

    cs.LG cs.AI cs.CG cs.CV stat.ML

    Batch Normalization Explained

    Authors: Randall Balestriero, Richard G. Baraniuk

    Abstract: A critically important, ubiquitous, and yet poorly understood ingredient in modern deep networks (DNs) is batch normalization (BN), which centers and normalizes the feature maps. To date, only limited progress has been made understanding why BN boosts DN learning and inference performance; work has focused exclusively on showing that BN smooths a DN's loss landscape. In this paper, we study BN the… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  26. arXiv:2208.00579  [pdf, other

    cs.LG math.NA

    Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

    Authors: Tan Nguyen, Richard G. Baraniuk, Robert M. Kirby, Stanley J. Osher, Bao Wang

    Abstract: Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear attention and hashing tricks; efficient transformers have been proposed to reduce the quadratic complexity of transformers but significantly degrade the accurac… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

    Comments: 22 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:2110.07034

    MSC Class: 65Pxx

  27. arXiv:2205.14055  [pdf, other

    cs.LG stat.ML

    A Blessing of Dimensionality in Membership Inference through Regularization

    Authors: Jasper Tan, Daniel LeJeune, Blake Mason, Hamid Javadi, Richard G. Baraniuk

    Abstract: Is overparameterization a privacy liability? In this work, we study the effect that the number of parameters has on a classifier's vulnerability to membership inference attacks. We first demonstrate how the number of parameters of a model can induce a privacy--utility trade-off: increasing the number of parameters generally improves generalization performance at the expense of lower privacy. Howev… ▽ More

    Submitted 13 April, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: 26 pages, 14 figures

  28. arXiv:2204.03145  [pdf, other

    stat.AP cs.LG stat.ML

    DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors

    Authors: Vishwanath Saragadam, Randall Balestriero, Ashok Veeraraghavan, Richard G. Baraniuk

    Abstract: DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-squared approximati… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: 14 pages

  29. arXiv:2203.03716  [pdf, other

    cs.CY cs.LG

    GPT-based Open-Ended Knowledge Tracing

    Authors: Naiming Liu, Zichao Wang, Richard G. Baraniuk, Andrew Lan

    Abstract: In education applications, knowledge tracing refers to the problem of estimating students' time-varying concept/skill mastery level from their past responses to questions and predicting their future performance. One key limitation of most existing knowledge tracing methods is that they treat student responses to questions as binary-valued, i.e., whether they are correct or incorrect. Response corr… ▽ More

    Submitted 20 March, 2023; v1 submitted 20 February, 2022; originally announced March 2022.

    Comments: This paper is accepted at EMNLP 2022. The code can be found at https://github.com/lucy66666/OKT

  30. Singular Value Perturbation and Deep Network Optimization

    Authors: Rudolf H. Riedi, Randall Balestriero, Richard G. Baraniuk

    Abstract: We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network. In particular, we explain analytically what deep learning practitioners have long observed empirically: the parameters of some deep architectures (e.g., residual networks, ResNets, and Dense networks, DenseNets) are easier to optimize than others (e.g., convol… ▽ More

    Submitted 5 December, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: Constr Approx (2022)

  31. arXiv:2202.11811  [pdf, other

    cs.LG

    NeuroView-RNN: It's About Time

    Authors: CJ Barberan, Sina Alemohammad, Naiming Liu, Randall Balestriero, Richard G. Baraniuk

    Abstract: Recurrent Neural Networks (RNNs) are important tools for processing sequential data such as time-series or video. Interpretability is defined as the ability to be understood by a person and is different from explainability, which is the ability to be explained in a mathematical formulation. A key interpretability issue with RNNs is that it is not clear how each hidden state per time step contribut… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: 21 pages, 13 figures, 9 tables

  32. On Local Distributions in Graph Signal Processing

    Authors: T. Mitchell Roddenberry, Fernando Gama, Richard G. Baraniuk, Santiago Segarra

    Abstract: Graph filtering is the cornerstone operation in graph signal processing (GSP). Thus, understanding it is key in develo** potent GSP methods. Graph filters are local and distributed linear operations, whose output depends only on the local neighborhood of each node. Moreover, a graph filter's output can be computed separately at each node by carrying out repeated exchanges with immediate neighbor… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

  33. arXiv:2202.03532  [pdf, other

    cs.CV

    MINER: Multiscale Implicit Neural Representations

    Authors: Vishwanath Saragadam, Jasper Tan, Guha Balakrishnan, Richard G. Baraniuk, Ashok Veeraraghavan

    Abstract: We introduce a new neural signal model designed for efficient high-resolution representation of large-scale signals. The key innovation in our multiscale implicit neural representation (MINER) is an internal representation via a Laplacian pyramid, which provides a sparse multiscale decomposition of the signal that captures orthogonal parts of the signal across scales. We leverage the advantages of… ▽ More

    Submitted 17 July, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: 14 pages, accepted to ECCV 2022

  34. arXiv:2202.01243  [pdf, other

    stat.ML cs.LG

    Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference

    Authors: Jasper Tan, Blake Mason, Hamid Javadi, Richard G. Baraniuk

    Abstract: A surprising phenomenon in modern machine learning is the ability of a highly overparameterized model to generalize well (small error on the test data) even when it is trained to memorize the training data (zero error on the training data). This has led to an arms race towards increasingly overparameterized models (c.f., deep learning). In this paper, we study an underexplored hidden cost of overp… ▽ More

    Submitted 30 November, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: 25 pages, 8 figures

  35. arXiv:2110.08678  [pdf, other

    cs.LG cs.CL stat.ML

    Improving Transformers with Probabilistic Attention Keys

    Authors: Tam Nguyen, Tan M. Nguyen, Dung D. Le, Duy Khuong Nguyen, Viet-Anh Tran, Richard G. Baraniuk, Nhat Ho, Stanley J. Osher

    Abstract: Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that for many applications, those attention heads learn redundant embedding, and most of them can be removed without degrading the performance of the model. Inspired by this observati… ▽ More

    Submitted 12 June, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: 27 pages, 16 figures, 10 tables

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  36. arXiv:2110.07778  [pdf, other

    cs.CV cs.LG

    NeuroView: Explainable Deep Network Decision Making

    Authors: CJ Barberan, Randall Balestriero, Richard G. Baraniuk

    Abstract: Deep neural networks (DNs) provide superhuman performance in numerous computer vision tasks, yet it remains unclear exactly which of a DN's units contribute to a particular decision. NeuroView is a new family of DN architectures that are interpretable/explainable by design. Each member of the family is derived from a standard DN architecture by vector quantizing the unit output values and feeding… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: 12 pages, 7 figures

  37. arXiv:2110.05240  [pdf, other

    cs.CV cs.LG

    Evaluating generative networks using Gaussian mixtures of image features

    Authors: Lorenzo Luzi, Carlos Ortiz Marrero, Nile Wynar, Richard G. Baraniuk, Michael J. Henry

    Abstract: We develop a measure for evaluating the performance of generative networks given two sets of images. A popular performance measure currently used to do this is the Fréchet Inception Distance (FID). FID assumes that images featurized using the penultimate layer of Inception-v3 follow a Gaussian distribution, an assumption which cannot be violated if we wish to use FID as a metric. However, we show… ▽ More

    Submitted 22 July, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

  38. arXiv:2110.04945  [pdf, other

    cs.LG

    NFT-K: Non-Fungible Tangent Kernels

    Authors: Sina Alemohammad, Hossein Babaei, CJ Barberan, Naiming Liu, Lorenzo Luzi, Blake Mason, Richard G. Baraniuk

    Abstract: Deep neural networks have become essential for numerous applications due to their strong empirical performance such as vision, RL, and classification. Unfortunately, these networks are quite difficult to interpret, and this limits their applicability in settings where interpretability is important for safety, such as medical imaging. One type of deep neural network is neural tangent kernel that is… ▽ More

    Submitted 10 October, 2021; originally announced October 2021.

  39. arXiv:2110.02915  [pdf, other

    cs.LG eess.SP stat.CO

    Unrolling Particles: Unsupervised Learning of Sampling Distributions

    Authors: Fernando Gama, Nicolas Zilberstein, Richard G. Baraniuk, Santiago Segarra

    Abstract: Particle filtering is used to compute good nonlinear estimates of complex systems. It samples trajectories from a chosen distribution and computes the estimate as a weighted average. Easy-to-sample distributions often lead to degenerate samples where only one trajectory carries all the weight, negatively affecting the resulting performance of the estimate. While much research has been done on the… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  40. arXiv:2109.04546  [pdf, other

    cs.CL

    Math Word Problem Generation with Mathematical Consistency and Problem Context Constraints

    Authors: Zichao Wang, Andrew S. Lan, Richard G. Baraniuk

    Abstract: We study the problem of generating arithmetic math word problems (MWPs) given a math equation that specifies the mathematical computation and a context that specifies the problem scenario. Existing approaches are prone to generating MWPs that are either mathematically invalid or have unsatisfactory language quality. They also either ignore the context or require manual specification of a problem t… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  41. arXiv:2109.02355  [pdf, other

    stat.ML cs.LG

    A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

    Authors: Yehuda Dar, Vidya Muthukumar, Richard G. Baraniuk

    Abstract: The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpo… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

  42. arXiv:2106.07769  [pdf, other

    cs.LG stat.ML

    The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization

    Authors: Daniel LeJeune, Hamid Javadi, Richard G. Baraniuk

    Abstract: Among the most successful methods for sparsifying deep (neural) networks are those that adaptively mask the network weights throughout training. By examining this masking, or dropout, in the linear case, we uncover a duality between such adaptive methods and regularization through the so-called "$η$-trick" that casts both as iteratively reweighted optimizations. We show that any dropout strategy t… ▽ More

    Submitted 3 January, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: 19 pages, 2 figures. Appeared in NeurIPS 2021. Small typographical correction

  43. arXiv:2104.07824  [pdf, ps, other

    cs.LG stat.ML

    NePTuNe: Neural Powered Tucker Network for Knowledge Graph Completion

    Authors: Shashank Sonkar, Arzoo Katiyar, Richard G. Baraniuk

    Abstract: Knowledge graphs link entities through relations to provide a structured representation of real world facts. However, they are often incomplete, because they are based on only a small fraction of all plausible facts. The task of knowledge graph completion via link prediction aims to overcome this challenge by inferring missing facts represented as links between entities. Current approaches to link… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  44. arXiv:2104.04034  [pdf, other

    cs.CY cs.HC

    Results and Insights from Diagnostic Questions: The NeurIPS 2020 Education Challenge

    Authors: Zichao Wang, Angus Lamb, Evgeny Saveliev, Pashmina Cameron, Yordan Zaykov, Jose Miguel Hernandez-Lobato, Richard E. Turner, Richard G. Baraniuk, Craig Barton, Simon Peyton Jones, Simon Woodhead, Cheng Zhang

    Abstract: This competition concerns educational diagnostic questions, which are pedagogically effective, multiple-choice questions (MCQs) whose distractors embody misconceptions. With a large and ever-increasing number of such questions, it becomes overwhelming for teachers to know which questions are the best ones to use for their students. We thus seek to answer the following question: how can we use data… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: text overlap with arXiv:2007.12061

  45. arXiv:2103.05621  [pdf, other

    cs.LG

    The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression

    Authors: Yehuda Dar, Daniel LeJeune, Richard G. Baraniuk

    Abstract: We study a fundamental transfer learning process from source to target linear regression tasks, including overparameterized settings where there are more learned parameters than data samples. The target task learning is addressed by using its training data together with the parameters previously computed for the source task. We define a transfer learning approach to the target task as a linear reg… ▽ More

    Submitted 31 May, 2024; v1 submitted 9 March, 2021; originally announced March 2021.

  46. arXiv:2010.13975  [pdf, other

    eess.SP cs.LG

    Wearing a MASK: Compressed Representations of Variable-Length Sequences Using Recurrent Neural Tangent Kernels

    Authors: Sina Alemohammad, Hossein Babaei, Randall Balestriero, Matt Y. Cheung, Ahmed Imtiaz Humayun, Daniel LeJeune, Naiming Liu, Lorenzo Luzi, Jasper Tan, Zichao Wang, Richard G. Baraniuk

    Abstract: High dimensionality poses many challenges to the use of data, from visualization and interpretation, to prediction and storage for historical preservation. Techniques abound to reduce the dimensionality of fixed-length sequences, yet these methods rarely generalize to variable-length sequences. To address this gap, we extend existing methods that rely on the use of kernels to variable-length seque… ▽ More

    Submitted 17 April, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

  47. arXiv:2007.12061  [pdf, other

    cs.CY cs.HC cs.LG

    Instructions and Guide for Diagnostic Questions: The NeurIPS 2020 Education Challenge

    Authors: Zichao Wang, Angus Lamb, Evgeny Saveliev, Pashmina Cameron, Yordan Zaykov, José Miguel Hernández-Lobato, Richard E. Turner, Richard G. Baraniuk, Craig Barton, Simon Peyton Jones, Simon Woodhead, Cheng Zhang

    Abstract: Digital technologies are becoming increasingly prevalent in education, enabling personalized, high quality education resources to be accessible by students across the world. Importantly, among these resources are diagnostic questions: the answers that the students give to these questions reveal key information about the specific nature of misconceptions that the students may hold. Analyzing the ma… ▽ More

    Submitted 12 April, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: 28 pages, 6 figures, NeurIPS 2020 Competition Track

  48. arXiv:2006.14600  [pdf, ps, other

    cs.LG stat.ML

    Ensembles of Generative Adversarial Networks for Disconnected Data

    Authors: Lorenzo Luzi, Randall Balestriero, Richard G. Baraniuk

    Abstract: Most current computer vision datasets are composed of disconnected sets, such as images from different classes. We prove that distributions of this type of data cannot be represented with a continuous generative network without error. They can be represented in two ways: With an ensemble of networks or with a single network with truncated latent space. We show that ensembles are more desirable tha… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

  49. arXiv:2006.10023  [pdf, other

    cs.LG stat.ML

    Analytical Probability Distributions and EM-Learning for Deep Generative Networks

    Authors: Randall Balestriero, Sebastien Paris, Richard G. Baraniuk

    Abstract: Deep Generative Networks (DGNs) with probabilistic modeling of their output and latent space are currently trained via Variational Autoencoders (VAEs). In the absence of a known analytical form for the posterior and likelihood expectation, VAEs resort to approximations, including (Amortized) Variational Inference (AVI) and Monte-Carlo (MC) sampling. We exploit the Continuous Piecewise Affine (CPA)… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

  50. arXiv:2006.07713  [pdf, other

    eess.SP cs.LG

    Interpretable Super-Resolution via a Learned Time-Series Representation

    Authors: Randall Balestriero, Herve Glotin, Richard G. Baraniuk

    Abstract: We develop an interpretable and learnable Wigner-Ville distribution that produces a super-resolved quadratic signal representation for time-series analysis. Our approach has two main hallmarks. First, it interpolates between known time-frequency representations (TFRs) in that it can reach super-resolution with increased time and frequency resolution beyond what the Heisenberg uncertainty principle… ▽ More

    Submitted 13 June, 2020; originally announced June 2020.