Skip to main content

Showing 1–50 of 170 results for author: Baraniuk, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00938  [pdf, other

    cs.CL cs.CY

    MalAlgoQA: A Pedagogical Approach for Evaluating Counterfactual Reasoning Abilities

    Authors: Naiming Liu, Shashank Sonkar, Myco Le, Richard Baraniuk

    Abstract: This paper introduces MalAlgoQA, a novel dataset designed to evaluate the counterfactual reasoning capabilities of Large Language Models (LLMs) through a pedagogical approach. The dataset comprises mathematics and reading comprehension questions, each accompanied by four answer choices and their corresponding rationales. We focus on the incorrect answer rationales, termed "malgorithms", which high… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.13781  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    A Primal-Dual Framework for Transformers and Neural Networks

    Authors: Tan M. Nguyen, Tam Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

    Abstract: Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often developed by heuristics and experience. To provide a principled framework for constructing attention layers in transformers, we show that the self-attention corresp… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted to ICLR 2023, 26 pages, 4 figures, 14 tables

  3. arXiv:2406.13188  [pdf, other

    cs.CL cs.LG

    Synthetic Context Generation for Question Generation

    Authors: Naiming Liu, Zichao Wang, Richard Baraniuk

    Abstract: Despite rapid advancements in large language models (LLMs), QG remains a challenging problem due to its complicated process, open-ended nature, and the diverse settings in which question generation occurs. A common approach to address these challenges involves fine-tuning smaller, custom models using datasets containing background context, question, and answer. However, obtaining suitable domain-s… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2406.09657  [pdf, other

    cs.LG stat.ML

    ScaLES: Scalable Latent Exploration Score for Pre-Trained Generative Networks

    Authors: Omer Ronen, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk, Bin Yu

    Abstract: We develop Scalable Latent Exploration Score (ScaLES) to mitigate over-exploration in Latent Space Optimization (LSO), a popular method for solving black-box discrete optimization problems. LSO utilizes continuous optimization within the latent space of a Variational Autoencoder (VAE) and is known to be susceptible to over-exploration, which manifests in unrealistic solutions that reduce its pract… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2405.13977  [pdf, other

    cs.LG stat.ML

    Removing Bias from Maximum Likelihood Estimation with Model Autophagy

    Authors: Paul Mayer, Lorenzo Luzi, Ali Siahkoohi, Don H. Johnson, Richard G. Baraniuk

    Abstract: We propose autophagy penalized likelihood estimation (PLE), an unbiased alternative to maximum likelihood estimation (MLE) which is more fair and less susceptible to model autophagy disorder (madness). Model autophagy refers to models trained on their own output; PLE ensures the statistics of these outputs coincide with the data statistics. This enables PLE to be statistically unbiased in certain… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 9 Pages, submission for NeurIPS 2024

    MSC Class: 68T07

  6. arXiv:2405.08134  [pdf, other

    cs.CL

    Many-Shot Regurgitation (MSR) Prompting

    Authors: Shashank Sonkar, Richard G. Baraniuk

    Abstract: We introduce Many-Shot Regurgitation (MSR) prompting, a new black-box membership inference attack framework for examining verbatim content reproduction in large language models (LLMs). MSR prompting involves dividing the input text into multiple segments and creating a single prompt that includes a series of faux conversation rounds between a user and a language model to elicit verbatim regurgitat… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  7. arXiv:2404.15156  [pdf, other

    cs.CL

    Regressive Side Effects of Training Language Models to Mimic Student Misconceptions

    Authors: Shashank Sonkar, Naiming Liu, Richard G. Baraniuk

    Abstract: This paper presents a novel exploration into the regressive side effects of training Large Language Models (LLMs) to mimic student misconceptions for personalized education. We highlight the problem that as LLMs are trained to more accurately mimic student misconceptions, there is a compromise in the factual integrity and reasoning ability of the models. Our work involved training an LLM on a stud… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  8. arXiv:2404.14316  [pdf, other

    cs.CL

    Automated Long Answer Grading with RiceChem Dataset

    Authors: Shashank Sonkar, Kangqi Ni, Lesa Tran Lu, Kristi Kincaid, John S. Hutchinson, Richard G. Baraniuk

    Abstract: We introduce a new area of study in the field of educational Natural Language Processing: Automated Long Answer Grading (ALAG). Distinguishing itself from Automated Short Answer Grading (ASAG) and Automated Essay Grading (AEG), ALAG presents unique challenges due to the complexity and multifaceted nature of fact-based long answers. To study ALAG, we introduce RiceChem, a dataset derived from a col… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  9. arXiv:2404.14301  [pdf, other

    cs.CL

    Marking: Visual Grading with Highlighting Errors and Annotating Missing Bits

    Authors: Shashank Sonkar, Naiming Liu, Debshila B. Mallick, Richard G. Baraniuk

    Abstract: In this paper, we introduce "Marking", a novel grading task that enhances automated grading systems by performing an in-depth analysis of student responses and providing students with visual highlights. Unlike traditional systems that provide binary scores, "marking" identifies and categorizes segments of the student response as correct, incorrect, or irrelevant and detects omissions from gold ans… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  10. arXiv:2402.15989  [pdf, other

    cs.AI eess.SY

    PIDformer: Transformer Meets Control Theory

    Authors: Tam Nguyen, César A. Uribe, Tan M. Nguyen, Richard G. Baraniuk

    Abstract: In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its solutions, leading to lower-rank outputs and diminished representation capacity. Moreover, the steady-state solution of the model is sensitive to input p… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  11. arXiv:2402.15555  [pdf, other

    cs.LG cs.AI cs.CV

    Deep Networks Always Grok and Here is Why

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: Grokking, or delayed generalization, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking in specific controlled settings, such as DNNs initialized with large-norm parameters or transformers trained on algorithmic datasets. We demonstrate that grokking is actually much mor… ▽ More

    Submitted 6 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: ICML 2024. Website: https://bit.ly/grok-adversarial. Pages 24, Figures 36

  12. arXiv:2402.05000  [pdf, other

    cs.CL

    Pedagogical Alignment of Large Language Models

    Authors: Shashank Sonkar, Kangqi Ni, Sapana Chaudhary, Richard G. Baraniuk

    Abstract: In this paper, we introduce the novel concept of pedagogically aligned Large Language Models (LLMs) that signifies a transformative shift in the application of LLMs within educational contexts. Rather than providing direct responses to user queries, pedagogically-aligned LLMs function as scaffolding tools, breaking complex problems into manageable subproblems and guiding students towards the final… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  13. arXiv:2401.14429  [pdf, ps, other

    cs.LG cs.RO eess.SP stat.ML

    [Re] The Discriminative Kalman Filter for Bayesian Filtering with Nonlinear and Non-Gaussian Observation Models

    Authors: Josue Casco-Rodriguez, Caleb Kemere, Richard G. Baraniuk

    Abstract: Kalman filters provide a straightforward and interpretable means to estimate hidden or latent variables, and have found numerous applications in control, robotics, signal processing, and machine learning. One such application is neural decoding for neuroprostheses. In 2020, Burkhart et al. thoroughly evaluated their new version of the Kalman filter that leverages Bayes' theorem to improve filter p… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  14. arXiv:2312.09323  [pdf, other

    cs.AI cs.LG

    Perspectives on the State and Future of Deep Learning - 2023

    Authors: Micah Goldblum, Anima Anandkumar, Richard Baraniuk, Tom Goldstein, Kyunghyun Cho, Zachary C Lipton, Melanie Mitchell, Preetum Nakkiran, Max Welling, Andrew Gordon Wilson

    Abstract: The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time. The plan is to host this survey periodically until the AI singularity paperclip-frenzy-driven doomsday, kee** an updated list of topical questions and interviewing new community members for each edition. In this issue, we probed people's opinions on inter… ▽ More

    Submitted 18 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

  15. arXiv:2312.00751  [pdf, other

    cs.CL cs.AI

    Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

    Authors: Tam Nguyen, Tan M. Nguyen, Richard G. Baraniuk

    Abstract: Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in which the token representations become identical when the model's depth grows. In this work, we show that self-attention layers in transformers minimize a functi… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 24 papes

  16. arXiv:2310.12977  [pdf, other

    cs.LG cs.AI cs.CV

    Training Dynamics of Deep Network Linear Regions

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: The study of Deep Network (DN) training dynamics has largely focused on the evolution of the loss function, evaluated on or around train and test set data points. In fact, many DN phenomenon were first introduced in literature with that respect, e.g., double descent, grokking. In this study, we look at the training dynamics of the input space partition or linear regions formed by continuous piecew… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 14 pages, 14 figures

  17. arXiv:2310.02439  [pdf, other

    cs.CL

    Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions

    Authors: Naiming Liu, Shashank Sonkar, Zichao Wang, Simon Woodhead, Richard G. Baraniuk

    Abstract: We propose novel evaluations for mathematical reasoning capabilities of Large Language Models (LLMs) based on mathematical misconceptions. Our primary approach is to simulate LLMs as a novice learner and an expert tutor, aiming to identify the incorrect answer to math question resulted from a specific misconception and to recognize the misconception(s) behind an incorrect answer, respectively. Con… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  18. arXiv:2310.00545  [pdf, other

    eess.SP cs.CV eess.IV

    Implicit Neural Representations and the Algebra of Complex Wavelets

    Authors: T. Mitchell Roddenberry, Vishwanath Saragadam, Maarten V. de Hoop, Richard G. Baraniuk

    Abstract: Implicit neural representations (INRs) have arisen as useful methods for representing signals on Euclidean domains. By parameterizing an image as a multilayer perceptron (MLP) on Euclidean space, INRs effectively represent signals in a way that couples spatial and spectral features of the signal that is not obvious in the usual discrete representation, paving the way for continuous signal processi… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: 10 pages, 6 figures. 2 appendix pages, 1 appendix figure

  19. arXiv:2309.12161  [pdf, other

    cs.CL

    Code Soliloquies for Accurate Calculations in Large Language Models

    Authors: Shashank Sonkar, MyCo Le, Xinghe Chen, Naiming Liu, Debshila Basu Mallick, Richard G. Baraniuk

    Abstract: High-quality conversational datasets are crucial for the successful development of Intelligent Tutoring Systems (ITS) that utilize a Large Language Model (LLM) backend. Synthetic student-teacher dialogues, generated using advanced GPT-4 models, are a common strategy for creating these datasets. However, subjects like physics that entail complex calculations pose a challenge. While GPT-4 presents i… ▽ More

    Submitted 31 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

  20. arXiv:2307.04643  [pdf, other

    cs.CL cs.AI

    MultiQG-TI: Towards Question Generation from Multi-modal Sources

    Authors: Zichao Wang, Richard Baraniuk

    Abstract: We study the new problem of automatic question generation (QG) from multi-modal sources containing images and texts, significantly expanding the scope of most of the existing work that focuses exclusively on QG from only textual sources. We propose a simple solution for our new problem, called MultiQG-TI, which enables a text-only question generator to process visual input in addition to textual i… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted at BEA workshop 2023; code https://github.com/moonlightlane/MultiQG-TI

  21. arXiv:2307.01850  [pdf, other

    cs.LG cs.AI cs.CV

    Self-Consuming Generative Models Go MAD

    Authors: Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk

    Abstract: Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of au… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 31 pages, 31 figures, pre-print

  22. arXiv:2305.14507  [pdf, other

    cs.CL

    Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models

    Authors: Shashank Sonkar, Richard G. Baraniuk

    Abstract: We explore whether Large Language Models (LLMs) are capable of logical reasoning with distorted facts, which we call Deduction under Perturbed Evidence (DUPE). DUPE presents a unique challenge to LLMs since they typically rely on their parameters, which encode mostly accurate information, to reason and make inferences. However, in DUPE, LLMs must reason over manipulated or falsified evidence prese… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  23. arXiv:2305.13297  [pdf, other

    cs.CL

    Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design

    Authors: Shashank Sonkar, Richard G. Baraniuk

    Abstract: This paper investigates the key role of Feed-Forward Networks (FFNs) in transformer models by utilizing the Parallel Attention and Feed-Forward Net Design (PAF) architecture, and comparing it to their Series Attention and Feed-Forward Net Design (SAF) counterparts. Central to the effectiveness of PAF are two main assumptions regarding the FFN block and the attention block within a layer: 1) the pr… ▽ More

    Submitted 25 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  24. arXiv:2305.13272  [pdf, other

    cs.CL

    CLASS: A Design Framework for building Intelligent Tutoring Systems based on Learning Science principles

    Authors: Shashank Sonkar, Naiming Liu, Debshila Basu Mallick, Richard G. Baraniuk

    Abstract: We present a design framework called Conversational Learning with Analytical Step-by-Step Strategies (CLASS) for building advanced Intelligent Tutoring Systems (ITS) powered by high-performance Large Language Models (LLMs). The CLASS framework empowers ITS with two key capabilities. First, through a carefully curated scaffolding dataset, CLASS equips ITS with essential problem-solving strategies,… ▽ More

    Submitted 25 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Paper accepted at EMNLP 2023

  25. arXiv:2302.12828  [pdf, other

    cs.CV cs.LG

    SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Guha Balakrishnan, Richard Baraniuk

    Abstract: Current Deep Network (DN) visualization and interpretability methods rely heavily on data space visualizations such as scoring which dimensions of the data are responsible for their associated prediction or generating new data features or samples that best match a given DN unit or representation. In this paper, we go one step further by develo** the first provably exact method for computing the… ▽ More

    Submitted 6 June, 2024; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: 11 pages, 20 figures

  26. arXiv:2301.05187  [pdf, other

    cs.CV cs.GR eess.IV

    WIRE: Wavelet Implicit Neural Representations

    Authors: Vishwanath Saragadam, Daniel LeJeune, Jasper Tan, Guha Balakrishnan, Ashok Veeraraghavan, Richard G. Baraniuk

    Abstract: Implicit neural representations (INRs) have recently advanced numerous vision-related areas. INR performance depends strongly on the choice of the nonlinear activation function employed in its multilayer perceptron (MLP) network. A wide range of nonlinearities have been explored, but, unfortunately, current INRs designed to have high accuracy also suffer from poor robustness (to signal noise, para… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

  27. arXiv:2212.09723  [pdf, other

    cs.CL

    MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

    Authors: Shashank Sonkar, Zichao Wang, Richard G. Baraniuk

    Abstract: This paper investigates the problem of Named Entity Recognition (NER) for extreme low-resource languages with only a few hundred tagged data samples. NER is a fundamental task in Natural Language Processing (NLP). A critical driver accelerating NER systems' progress is the existence of large-scale language corpora that enable NER systems to achieve outstanding performance in languages such as Engl… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  28. arXiv:2212.06345  [pdf, other

    physics.optics cs.CV

    Foveated Thermal Computational Imaging in the Wild Using All-Silicon Meta-Optics

    Authors: Vishwanath Saragadam, Zheyi Han, Vivek Boominathan, Luocheng Huang, Shiyu Tan, Johannes E. Fröch, Karl F. Böhringer, Richard G. Baraniuk, Arka Majumdar, Ashok Veeraraghavan

    Abstract: Foveated imaging provides a better tradeoff between situational awareness (field of view) and resolution and is critical in long-wavelength infrared regimes because of the size, weight, power, and cost of thermal sensors. We demonstrate computational foveated imaging by exploiting the ability of a meta-optical frontend to discriminate between different polarization states and a computational backe… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  29. arXiv:2211.11074  [pdf, other

    cs.LG

    Frozen Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks

    Authors: Yehuda Dar, Lorenzo Luzi, Richard G. Baraniuk

    Abstract: We study the generalization behavior of transfer learning of deep neural networks (DNNs). We adopt the overparameterization perspective -- featuring interpolation of the training data (i.e., approximately zero train error) and the double descent phenomenon -- to explain the delicate effect of the transfer learning setting on generalization performance. We study how the generalization behavior of t… ▽ More

    Submitted 12 June, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

  30. arXiv:2211.03751  [pdf, other

    math.NA cs.DS math.ST

    Asymptotics of the Sketched Pseudoinverse

    Authors: Daniel LeJeune, Pratik Patil, Hamid Javadi, Richard G. Baraniuk, Ryan J. Tibshirani

    Abstract: We take a random matrix theory approach to random sketching and show an asymptotic first-order equivalence of the regularized sketched pseudoinverse of a positive semidefinite matrix to a certain evaluation of the resolvent of the same matrix. We focus on real-valued regularization and extend previous results on an asymptotic equivalence of random matrices to the real setting, providing a precise… ▽ More

    Submitted 6 October, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: 45 pages, 9 figures

    MSC Class: 15B52; 46L54; 62J07

  31. arXiv:2210.12565  [pdf, other

    cs.CL cs.LG

    A Visual Tour Of Current Challenges In Multimodal Language Models

    Authors: Shashank Sonkar, Naiming Liu, Richard G. Baraniuk

    Abstract: Transformer models trained on massive text corpora have become the de facto models for a wide range of natural language processing tasks. However, learning effective word representations for function words remains challenging. Multimodal learning, which visually grounds transformer models in imagery, can overcome the challenges to some extent; however, there is still much work to be done. In this… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

  32. arXiv:2210.12100  [pdf, other

    cs.CV cs.LG stat.ML

    Boomerang: Local sampling on image manifolds using diffusion models

    Authors: Lorenzo Luzi, Paul M Mayer, Josue Casco-Rodriguez, Ali Siahkoohi, Richard G. Baraniuk

    Abstract: The inference stage of diffusion models can be seen as running a reverse-time diffusion stochastic differential equation, where samples from a Gaussian latent distribution are transformed into samples from a target distribution that usually reside on a low-dimensional manifold, e.g., an image manifold. The intermediate values between the initial latent space and the image manifold can be interpret… ▽ More

    Submitted 17 April, 2024; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: Published in Transactions on Machine Learning Research

  33. arXiv:2209.14778  [pdf, other

    cs.LG cs.AI cs.CG cs.CV stat.ML

    Batch Normalization Explained

    Authors: Randall Balestriero, Richard G. Baraniuk

    Abstract: A critically important, ubiquitous, and yet poorly understood ingredient in modern deep networks (DNs) is batch normalization (BN), which centers and normalizes the feature maps. To date, only limited progress has been made understanding why BN boosts DN learning and inference performance; work has focused exclusively on showing that BN smooths a DN's loss landscape. In this paper, we study BN the… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  34. arXiv:2208.11126  [pdf, other

    q-bio.QM cs.LG

    Retrieval-based Controllable Molecule Generation

    Authors: Zichao Wang, Weili Nie, Zhuoran Qiao, Chaowei Xiao, Richard Baraniuk, Anima Anandkumar

    Abstract: Generating new molecules with specified chemical and biological properties via generative models has emerged as a promising direction for drug discovery. However, existing methods require extensive training/fine-tuning with a large dataset, often unavailable in real-world generation tasks. In this work, we propose a new retrieval-based framework for controllable molecule generation. We use a small… ▽ More

    Submitted 24 April, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: ICLR 2023

  35. arXiv:2208.00579  [pdf, other

    cs.LG math.NA

    Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

    Authors: Tan Nguyen, Richard G. Baraniuk, Robert M. Kirby, Stanley J. Osher, Bao Wang

    Abstract: Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear attention and hashing tricks; efficient transformers have been proposed to reduce the quadratic complexity of transformers but significantly degrade the accurac… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

    Comments: 22 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:2110.07034

    MSC Class: 65Pxx

  36. arXiv:2205.14055  [pdf, other

    cs.LG stat.ML

    A Blessing of Dimensionality in Membership Inference through Regularization

    Authors: Jasper Tan, Daniel LeJeune, Blake Mason, Hamid Javadi, Richard G. Baraniuk

    Abstract: Is overparameterization a privacy liability? In this work, we study the effect that the number of parameters has on a classifier's vulnerability to membership inference attacks. We first demonstrate how the number of parameters of a model can induce a privacy--utility trade-off: increasing the number of parameters generally improves generalization performance at the expense of lower privacy. Howev… ▽ More

    Submitted 13 April, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: 26 pages, 14 figures

  37. arXiv:2205.09864  [pdf, other

    cs.LG cs.AI cs.CY

    Automated Scoring for Reading Comprehension via In-context BERT Tuning

    Authors: Nigel Fernandez, Aritra Ghosh, Naiming Liu, Zichao Wang, Benoît Choffin, Richard Baraniuk, Andrew Lan

    Abstract: Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring often leverage textual representations based on pre-trained language models such as BERT and GPT as input to scoring models. Most existing approaches train a separate model for each item/question, which is suitable for scenarios such as essay scoring… ▽ More

    Submitted 15 June, 2023; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: Published as a conference paper at AIED 2022. A grand prize-winner for the NAEP AS Challenge. Code available at: https://github.com/ni9elf/automated-scoring

  38. arXiv:2204.03145  [pdf, other

    stat.AP cs.LG stat.ML

    DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors

    Authors: Vishwanath Saragadam, Randall Balestriero, Ashok Veeraraghavan, Richard G. Baraniuk

    Abstract: DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-squared approximati… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: 14 pages

  39. arXiv:2203.08124  [pdf, other

    cs.LG cs.CV

    Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

    Authors: Gowthami Somepalli, Liam Fowl, Arpit Bansal, ** Yeh-Chiang, Yehuda Dar, Richard Baraniuk, Micah Goldblum, Tom Goldstein

    Abstract: We discuss methods for visualizing neural network decision boundaries and decision regions. We use these visualizations to investigate issues related to reproducibility and generalization in neural network training. We observe that changes in model architecture (and its associate inductive bias) cause visible changes in decision boundaries, while multiple runs with the same architecture yield resu… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: To appear in CVPR 2022

  40. arXiv:2203.03716  [pdf, other

    cs.CY cs.LG

    GPT-based Open-Ended Knowledge Tracing

    Authors: Naiming Liu, Zichao Wang, Richard G. Baraniuk, Andrew Lan

    Abstract: In education applications, knowledge tracing refers to the problem of estimating students' time-varying concept/skill mastery level from their past responses to questions and predicting their future performance. One key limitation of most existing knowledge tracing methods is that they treat student responses to questions as binary-valued, i.e., whether they are correct or incorrect. Response corr… ▽ More

    Submitted 20 March, 2023; v1 submitted 20 February, 2022; originally announced March 2022.

    Comments: This paper is accepted at EMNLP 2022. The code can be found at https://github.com/lucy66666/OKT

  41. Singular Value Perturbation and Deep Network Optimization

    Authors: Rudolf H. Riedi, Randall Balestriero, Richard G. Baraniuk

    Abstract: We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network. In particular, we explain analytically what deep learning practitioners have long observed empirically: the parameters of some deep architectures (e.g., residual networks, ResNets, and Dense networks, DenseNets) are easier to optimize than others (e.g., convol… ▽ More

    Submitted 5 December, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: Constr Approx (2022)

  42. arXiv:2203.02502  [pdf, other

    cs.LG cs.AI

    No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Anastasios Kyrillidis, Richard Baraniuk

    Abstract: Centroid based clustering methods such as k-means, k-medoids and k-centers are heavily applied as a go-to tool in exploratory data analysis. In many cases, those methods are used to obtain representative centroids of the data manifold for visualization or summarization of a dataset. Real world datasets often contain inherent abnormalities, e.g., repeated samples and sampling bias, that manifest im… ▽ More

    Submitted 15 June, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted for ICASSP 2022, 8 figures, 1 table

  43. arXiv:2203.01993  [pdf, other

    cs.CV

    Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: We present Polarity Sampling, a theoretically justified plug-and-play method for controlling the generation quality and diversity of pre-trained deep generative networks DGNs). Leveraging the fact that DGNs are, or can be approximated by, continuous piecewise affine splines, we derive the analytical DGN output space distribution as a function of the product of the DGN's Jacobian singular values ra… ▽ More

    Submitted 6 May, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: 20 pages, 16 figures, CVPR 2022 Oral, Camera Ready

  44. arXiv:2202.11811  [pdf, other

    cs.LG

    NeuroView-RNN: It's About Time

    Authors: CJ Barberan, Sina Alemohammad, Naiming Liu, Randall Balestriero, Richard G. Baraniuk

    Abstract: Recurrent Neural Networks (RNNs) are important tools for processing sequential data such as time-series or video. Interpretability is defined as the ability to be understood by a person and is different from explainability, which is the ability to be explained in a mathematical formulation. A key interpretability issue with RNNs is that it is not clear how each hidden state per time step contribut… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: 21 pages, 13 figures, 9 tables

  45. On Local Distributions in Graph Signal Processing

    Authors: T. Mitchell Roddenberry, Fernando Gama, Richard G. Baraniuk, Santiago Segarra

    Abstract: Graph filtering is the cornerstone operation in graph signal processing (GSP). Thus, understanding it is key in develo** potent GSP methods. Graph filters are local and distributed linear operations, whose output depends only on the local neighborhood of each node. Moreover, a graph filter's output can be computed separately at each node by carrying out repeated exchanges with immediate neighbor… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

  46. arXiv:2202.07829  [pdf, other

    cs.LG cs.CV

    Spatial Transformer K-Means

    Authors: Romain Cosentino, Randall Balestriero, Yanis Bahroun, Anirvan Sengupta, Richard Baraniuk, Behnaam Aazhang

    Abstract: K-means defines one of the most employed centroid-based clustering algorithms with performances tied to the data's embedding. Intricate data embeddings have been designed to push $K$-means performances at the cost of reduced theoretical guarantees and interpretability of the results. Instead, we propose preserving the intrinsic data space and augment K-means with a similarity measure invariant to… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2012.09743

  47. arXiv:2202.03532  [pdf, other

    cs.CV

    MINER: Multiscale Implicit Neural Representations

    Authors: Vishwanath Saragadam, Jasper Tan, Guha Balakrishnan, Richard G. Baraniuk, Ashok Veeraraghavan

    Abstract: We introduce a new neural signal model designed for efficient high-resolution representation of large-scale signals. The key innovation in our multiscale implicit neural representation (MINER) is an internal representation via a Laplacian pyramid, which provides a sparse multiscale decomposition of the signal that captures orthogonal parts of the signal across scales. We leverage the advantages of… ▽ More

    Submitted 17 July, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: 14 pages, accepted to ECCV 2022

  48. arXiv:2202.01243  [pdf, other

    stat.ML cs.LG

    Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference

    Authors: Jasper Tan, Blake Mason, Hamid Javadi, Richard G. Baraniuk

    Abstract: A surprising phenomenon in modern machine learning is the ability of a highly overparameterized model to generalize well (small error on the test data) even when it is trained to memorize the training data (zero error on the training data). This has led to an arms race towards increasingly overparameterized models (c.f., deep learning). In this paper, we study an underexplored hidden cost of overp… ▽ More

    Submitted 30 November, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: 25 pages, 8 figures

  49. arXiv:2110.08678  [pdf, other

    cs.LG cs.CL stat.ML

    Improving Transformers with Probabilistic Attention Keys

    Authors: Tam Nguyen, Tan M. Nguyen, Dung D. Le, Duy Khuong Nguyen, Viet-Anh Tran, Richard G. Baraniuk, Nhat Ho, Stanley J. Osher

    Abstract: Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that for many applications, those attention heads learn redundant embedding, and most of them can be removed without degrading the performance of the model. Inspired by this observati… ▽ More

    Submitted 12 June, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: 27 pages, 16 figures, 10 tables

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  50. arXiv:2110.08009  [pdf, other

    cs.LG cs.CV

    MaGNET: Uniform Sampling from Deep Generative Network Manifolds Without Retraining

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: Deep Generative Networks (DGNs) are extensively employed in Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and their variants to approximate the data manifold and distribution. However, training samples are often distributed in a non-uniform fashion on the manifold, due to costs or convenience of collection. For example, the CelebA dataset contains a large fraction of smi… ▽ More

    Submitted 20 January, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: ICLR Accepted version, 28 pages, 23 figures