Skip to main content

Showing 1–38 of 38 results for author: Kvinge, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05496  [pdf, other

    cs.CL

    Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities

    Authors: Sai Munikoti, Ian Stewart, Sameera Horawalavithana, Henry Kvinge, Tegan Emerson, Sandra E Thompson, Karl Pazdernik

    Abstract: Multimodal models are expected to be a critical component to future advances in artificial intelligence. This field is starting to grow rapidly with a surge of new design elements motivated by the success of foundation models in natural language processing (NLP) and vision. It is widely hoped that further extending the foundation models to multiple modalities (e.g., text, image, video, sensor, tim… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 25 pages, 3 figures, 5 tables

  2. arXiv:2312.04600  [pdf, other

    cond-mat.mes-hall cs.LG math.AT

    Haldane Bundles: A Dataset for Learning to Predict the Chern Number of Line Bundles on the Torus

    Authors: Cody Tipton, Elizabeth Coda, Davis Brown, Alyson Bittner, Jung Lee, Grayson Jorgenson, Tegan Emerson, Henry Kvinge

    Abstract: Characteristic classes, which are abstract topological invariants associated with vector bundles, have become an important notion in modern physics with surprising real-world consequences. As a representative example, the incredible properties of topological insulators, which are insulators in their bulk but conductors on their surface, can be completely characterized by a specific characteristic… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  3. arXiv:2310.14993  [pdf, other

    cs.LG cs.AI cs.CL

    Understanding the Inner Workings of Language Models Through Representation Dissimilarity

    Authors: Davis Brown, Charles Godfrey, Nicholas Konz, Jonathan Tu, Henry Kvinge

    Abstract: As language models are applied to an increasing number of real-world applications, understanding their inner workings has become an important issue in model trust, interpretability, and transparency. In this work we show that representation dissimilarity measures, which are functions that measure the extent to which two model's internal representations differ, can be a valuable tool for gaining in… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 (main)

  4. arXiv:2310.03149  [pdf, other

    cs.LG cs.AI cs.CV

    Attributing Learned Concepts in Neural Networks to Training Data

    Authors: Nicholas Konz, Charles Godfrey, Madelyn Shapiro, Jonathan Tu, Henry Kvinge, Davis Brown

    Abstract: By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data. As having the right (or wrong) concepts is critical to trustworthy machine learning systems, it is natural to ask which inputs from the model's original training set were most important for learning a concept at a given layer. To answer this,… ▽ More

    Submitted 28 December, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: ATTRIB Workshop at NeurIPS 2023

  5. ICML 2023 Topological Deep Learning Challenge : Design and Results

    Authors: Mathilde Papillon, Mustafa Hajij, Helen Jenne, Johan Mathe, Audun Myers, Theodore Papamarkou, Tolga Birdal, Tamal Dey, Tim Doster, Tegan Emerson, Gurusankar Gopalakrishnan, Devendra Govil, Aldo Guzmán-Sáenz, Henry Kvinge, Neal Livesay, Soham Mukherjee, Shreyas N. Samaga, Karthikeyan Natesan Ramamurthy, Maneel Reddy Karri, Paul Rosen, Sophia Sanborn, Robin Walters, Jens Agerberg, Sadrodin Barikbin, Claudio Battiloro , et al. (31 additional authors not shown)

    Abstract: This paper presents the computational challenge on topological deep learning that was hosted within the ICML 2023 Workshop on Topology and Geometry in Machine Learning. The competition asked participants to provide open-source implementations of topological neural networks from the literature by contributing to the python packages TopoNetX (data processing) and TopoModelX (deep learning). The chal… ▽ More

    Submitted 18 January, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

  6. arXiv:2307.01139  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions

    Authors: Sameera Horawalavithana, Sai Munikoti, Ian Stewart, Henry Kvinge

    Abstract: Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving the LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present SciTune as a tuning framework to improve the ability of LLMs to follow scientific multimodal instructions. To test o… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Preprint. Work in progress

  7. arXiv:2305.13509  [pdf, ps, other

    cs.CV cs.AI cs.LG eess.IV

    ColMix -- A Simple Data Augmentation Framework to Improve Object Detector Performance and Robustness in Aerial Images

    Authors: Cuong Ly, Grayson Jorgenson, Dan Rosa de Jesus, Henry Kvinge, Adam Attarian, Yi**g Watkins

    Abstract: In the last decade, Convolutional Neural Network (CNN) and transformer based object detectors have achieved high performance on a large variety of datasets. Though the majority of detection literature has developed this capability on datasets such as MS COCO, these detectors have still proven effective for remote sensing applications. Challenges in this particular domain, such as small numbers of… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  8. arXiv:2303.14173  [pdf, other

    cs.LG cs.CR stat.ML

    How many dimensions are required to find an adversarial example?

    Authors: Charles Godfrey, Henry Kvinge, Elise Bishoff, Myles Mckay, Davis Brown, Tim Doster, Eleanor Byler

    Abstract: Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrai… ▽ More

    Submitted 10 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: Comments welcome! V2: minor edits for clarity

    MSC Class: 68T07 (Primary) ACM Class: G.3; I.2; I.5; J.2

  9. arXiv:2303.06208  [pdf, ps, other

    cs.LG math.CO math.RT stat.ML

    Fast computation of permutation equivariant layers with the partition algebra

    Authors: Charles Godfrey, Michael G. Rawson, Davis Brown, Henry Kvinge

    Abstract: Linear neural network layers that are either equivariant or invariant to permutations of their inputs form core building blocks of modern deep learning architectures. Examples include the layers of DeepSets, as well as linear layers occurring in attention blocks of transformers and some graph neural networks. The space of permutation equivariant linear layers can be identified as the invariant sub… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Comments welcome!

    MSC Class: 68T07 (Primary) 05E10; 20C30 (Secondary) ACM Class: G.3; I.2; I.5; J.2

  10. arXiv:2303.00046  [pdf, other

    cs.LG

    Edit at your own risk: evaluating the robustness of edited models to distribution shifts

    Authors: Davis Brown, Charles Godfrey, Cody Nizinski, Jonathan Tu, Henry Kvinge

    Abstract: The current trend toward ever-larger models makes standard retraining procedures an ever-more expensive burden. For this reason, there is growing interest in model editing, which enables computationally inexpensive, interpretable, post-hoc model modifications. While many model editing techniques are promising, research on the properties of edited models is largely limited to evaluation of validati… ▽ More

    Submitted 17 July, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

    Comments: DB and CG contributed equally

  11. arXiv:2302.09301  [pdf, other

    cs.CL cs.CV cs.LG

    Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension

    Authors: Henry Kvinge, Davis Brown, Charles Godfrey

    Abstract: Prompting has become an important mechanism by which users can more effectively interact with many flavors of foundation model. Indeed, the last several years have shown that well-honed prompts can sometimes unlock emergent capabilities within such models. While there has been a substantial amount of empirical exploration of prompting within the community, relatively few works have studied prompti… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: 11 pages

  12. arXiv:2302.08495  [pdf, other

    cs.CV cs.LG

    Parameters, Properties, and Process: Conditional Neural Generation of Realistic SEM Imagery Towards ML-assisted Advanced Manufacturing

    Authors: Scott Howland, Lara Kassab, Keerti Kappagantula, Henry Kvinge, Tegan Emerson

    Abstract: The research and development cycle of advanced manufacturing processes traditionally requires a large investment of time and resources. Experiments can be expensive and are hence conducted on relatively small scales. This poses problems for typically data-hungry machine learning tools which could otherwise expedite the development cycle. We build upon prior work by applying conditional generative… ▽ More

    Submitted 12 January, 2023; originally announced February 2023.

  13. arXiv:2211.10558  [pdf, other

    cs.LG cs.CV

    Internal Representations of Vision Models Through the Lens of Frames on Data Manifolds

    Authors: Henry Kvinge, Grayson Jorgenson, Davis Brown, Charles Godfrey, Tegan Emerson

    Abstract: While the last five years have seen considerable progress in understanding the internal representations of deep learning models, many questions remain. This is especially true when trying to understand the impact of model design choices, such as model architecture or training algorithm, on hidden representation geometry and dynamics. In this work we present a new approach to studying such represen… ▽ More

    Submitted 6 December, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: 30 pages, accepted as an oral presentation at the Workshop on Symmetry and Geometry in Neural Representations at NeurIPS 2023

  14. arXiv:2211.07697  [pdf, other

    cs.LG cs.CV math.AT

    Do Neural Networks Trained with Topological Features Learn Different Internal Representations?

    Authors: Sarah McGuire, Shane Jackson, Tegan Emerson, Henry Kvinge

    Abstract: There is a growing body of work that leverages features extracted via topological data analysis to train machine learning models. While this field, sometimes known as topological machine learning (TML), has seen some notable successes, an understanding of how the process of learning from topological features differs from the process of learning from raw data is still limited. In this work, we begi… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: To appear at NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations (NeurReps)

  15. arXiv:2210.03773  [pdf, other

    cs.LG cs.CV

    In What Ways Are Deep Neural Networks Invariant and How Should We Measure This?

    Authors: Henry Kvinge, Tegan H. Emerson, Grayson Jorgenson, Scott Vasquez, Timothy Doster, Jesse D. Lew

    Abstract: It is often said that a deep learning model is "invariant" to some specific type of transformation. However, what is meant by this statement strongly depends on the context in which it is made. In this paper we explore the nature of invariance and equivariance of deep learning models with the goal of better understanding the ways in which they actually capture these concepts on a formal level. We… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: To appear at NeurIPS 2022

  16. arXiv:2210.01257  [pdf, other

    cs.LG stat.ML

    Testing predictions of representation cost theory with CNNs

    Authors: Charles Godfrey, Elise Bishoff, Myles Mckay, Davis Brown, Grayson Jorgenson, Henry Kvinge, Eleanor Byler

    Abstract: It is widely acknowledged that trained convolutional neural networks (CNNs) have different levels of sensitivity to signals of different frequency. In particular, a number of empirical studies have documented CNNs sensitivity to low-frequency signals. In this work we show with theory and experiments that this observed sensitivity is a consequence of the frequency distribution of natural images, wh… ▽ More

    Submitted 25 September, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Comments welcome! V2: Conjecture on non-commutative generalized Hölder upgraded to Lemma 4.11, as a consequence restrictions on Theorem 4.9 removed, more datasets, more variable frequency statistics and more CNN architectures. V3: title updated to better reflect content, some new ablations with untrained networks

  17. arXiv:2205.14258  [pdf, other

    cs.LG cs.AI

    On the Symmetries of Deep Learning Models and their Internal Representations

    Authors: Charles Godfrey, Davis Brown, Tegan Emerson, Henry Kvinge

    Abstract: Symmetry is a fundamental tool in the exploration of a broad range of complex systems. In machine learning symmetry has been explored in both models and data. In this paper we seek to connect the symmetries arising from the architecture of a family of models with the symmetries of that family's internal representation of data. We do this by calculating a set of fundamental symmetry groups, which w… ▽ More

    Submitted 24 March, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: CG and DB contributed equally. V2: clarified relationship between $μ_{\mathrm{CKA}}$ and existing instances of CKA. V3: more experiments, alternative stitching capacity comparison, GeLU intertwiner group. V4: minor typo corrections. V4: failure of PSD property for max kernel used in $μ_{\mathrm{CKA}}$ (thanks to Derek Lim)

    MSC Class: 68T07 (Primary) 20C35; 62H20 (Secondary) ACM Class: I.2; G.3

  18. arXiv:2204.00629  [pdf, other

    cond-mat.mtrl-sci cs.CV cs.LG math.AT

    TopTemp: Parsing Precipitate Structure from Temper Topology

    Authors: Lara Kassab, Scott Howland, Henry Kvinge, Keerti Sahithi Kappagantula, Tegan Emerson

    Abstract: Technological advances are in part enabled by the development of novel manufacturing processes that give rise to new materials or material property improvements. Development and evaluation of new manufacturing methodologies is labor-, time-, and resource-intensive expensive due to complex, poorly defined relationships between advanced manufacturing process parameters and the resulting microstructu… ▽ More

    Submitted 6 May, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    MSC Class: 55N31 (Primary)

  19. arXiv:2203.08189  [pdf, other

    cs.LG

    Fiber Bundle Morphisms as a Framework for Modeling Many-to-Many Maps

    Authors: Elizabeth Coda, Nico Courts, Colby Wight, Loc Truong, WoongJo Choi, Charles Godfrey, Tegan Emerson, Keerti Kappagantula, Henry Kvinge

    Abstract: While it is not generally reflected in the `nice' datasets used for benchmarking machine learning algorithms, the real-world is full of processes that would be best described as many-to-many. That is, a single input can potentially yield many different outputs (whether due to noise, imperfect measurement, or intrinsic stochasticity in the process) and many different inputs can yield the same outpu… ▽ More

    Submitted 29 April, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

  20. arXiv:2112.09277  [pdf, other

    cs.LG

    DNA: Dynamic Network Augmentation

    Authors: Scott Mahan, Tim Doster, Henry Kvinge

    Abstract: In many classification problems, we want a classifier that is robust to a range of non-semantic transformations. For example, a human can identify a dog in a picture regardless of the orientation and pose in which it appears. There is substantial evidence that this kind of invariance can significantly improve the accuracy and generalization of machine learning models. A common technique to teach a… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  21. arXiv:2112.01687  [pdf, other

    cs.LG cs.AI

    Differential Property Prediction: A Machine Learning Approach to Experimental Design in Advanced Manufacturing

    Authors: Loc Truong, WoongJo Choi, Colby Wight, Lizzy Coda, Tegan Emerson, Keerti Kappagantula, Henry Kvinge

    Abstract: Advanced manufacturing techniques have enabled the production of materials with state-of-the-art properties. In many cases however, the development of physics-based models of these techniques lags behind their use in the lab. This means that designing and running experiments proceeds largely via trial and error. This is sub-optimal since experiments are cost-, time-, and labor-intensive. In this w… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  22. arXiv:2111.10937  [pdf, other

    cs.LG cs.CV

    Adaptive Transfer Learning: a simple but effective transfer learning

    Authors: Jung H Lee, Henry J Kvinge, Scott Howland, Zachary New, John Buckheit, Lauren A. Phillips, Elliott Skomski, Jessica Hibler, Courtney D. Corley, Nathan O. Hodas

    Abstract: Transfer learning (TL) leverages previously obtained knowledge to learn new tasks efficiently and has been used to train deep learning (DL) models with limited amount of data. When TL is applied to DL, pretrained (teacher) models are fine-tuned to build domain specific (student) models. This fine-tuning relies on the fact that DL model can be decomposed to classifiers and feature extractors, and a… ▽ More

    Submitted 21 November, 2021; originally announced November 2021.

    Comments: 10 pages, 7 figures

  23. arXiv:2110.07120  [pdf, other

    cs.LG cs.CV

    Making Corgis Important for Honeycomb Classification: Adversarial Attacks on Concept-based Explainability Tools

    Authors: Davis Brown, Henry Kvinge

    Abstract: Methods for model explainability have become increasingly critical for testing the fairness and soundness of deep learning. Concept-based interpretability techniques, which use a small set of human-interpretable concept exemplars in order to measure the influence of a concept on a model's internal representation of input, are an important thread in this line of research. In this work we show that… ▽ More

    Submitted 26 July, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: AdvML Frontiers 2022 @ ICML 2022 workshop

  24. arXiv:2110.06983  [pdf, other

    cs.LG cs.AI math.GT

    Bundle Networks: Fiber Bundles, Local Trivializations, and a Generative Approach to Exploring Many-to-one Maps

    Authors: Nico Courts, Henry Kvinge

    Abstract: Many-to-one maps are ubiquitous in machine learning, from the image recognition model that assigns a multitude of distinct images to the concept of "cat" to the time series forecasting model which assigns a range of distinct time-series to a single scalar regression value. While the primary use of such models is naturally to associate correct output to each input, in many problems it is also usefu… ▽ More

    Submitted 24 February, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at ICLR 2022; 19 pages

    MSC Class: 53Z50

  25. arXiv:2107.04714  [pdf, other

    cs.LG cs.CV math.GN

    A Topological-Framework to Improve Analysis of Machine Learning Model Performance

    Authors: Henry Kvinge, Colby Wight, Sarah Akers, Scott Howland, Woongjo Choi, Xiaolong Ma, Luke Gosink, Elizabeth Jurrus, Keerti Kappagantula, Tegan H. Emerson

    Abstract: As both machine learning models and the datasets on which they are evaluated have grown in size and complexity, the practice of using a few summary statistics to understand model performance has become increasingly problematic. This is particularly true in real-world scenarios where understanding model failure on certain subpopulations of the data is of critical importance. In this paper we propos… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: 6 pages

  26. arXiv:2106.04009  [pdf, other

    cs.LG

    Rotating spiders and reflecting dogs: a class conditional approach to learning data augmentation distributions

    Authors: Scott Mahan, Henry Kvinge, Tim Doster

    Abstract: Building invariance to non-meaningful transformations is essential to building efficient and generalizable machine learning models. In practice, the most common way to learn invariance is through data augmentation. There has been recent interest in the development of methods that learn distributions on augmentation transformations from the training data itself. While such approaches are beneficial… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: 10 pages, 6 figures, submitted to NeurIPS 2021

  27. arXiv:2106.01423  [pdf, other

    cs.LG cs.AI cs.CV math.MG

    One Representation to Rule Them All: Identifying Out-of-Support Examples in Few-shot Learning with Generic Representations

    Authors: Henry Kvinge, Scott Howland, Nico Courts, Lauren A. Phillips, John Buckheit, Zachary New, Elliott Skomski, Jung H. Lee, Sandeep Tiwari, Jessica Hibler, Courtney D. Corley, Nathan O. Hodas

    Abstract: The field of few-shot learning has made remarkable strides in develo** powerful models that can operate in the small data regime. Nearly all of these methods assume every unlabeled instance encountered will belong to a handful of known classes for which one has examples. This can be problematic for real-world use cases where one routinely finds 'none-of-the-above' examples. In this paper we desc… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 15 pages

  28. arXiv:2105.10414  [pdf, other

    cs.LG cs.CV math.AT math.CT

    Sheaves as a Framework for Understanding and Interpreting Model Fit

    Authors: Henry Kvinge, Brett Jefferson, Cliff Joslyn, Emilie Purvine

    Abstract: As data grows in size and complexity, finding frameworks which aid in interpretation and analysis has become critical. This is particularly true when data comes from complex systems where extensive structure is available, but must be drawn from peripheral sources. In this paper we argue that in such situations, sheaves can provide a natural framework to analyze how well a statistical model fits at… ▽ More

    Submitted 21 May, 2021; originally announced May 2021.

    Comments: 12 page

  29. arXiv:2104.03496  [pdf, other

    cs.CV cs.LG cs.NE

    Prototypical Region Proposal Networks for Few-Shot Localization and Classification

    Authors: Elliott Skomski, Aaron Tuor, Andrew Avila, Lauren Phillips, Zachary New, Henry Kvinge, Courtney D. Corley, Nathan Hodas

    Abstract: Recently proposed few-shot image classification methods have generally focused on use cases where the objects to be classified are the central subject of images. Despite success on benchmark vision datasets aligned with this use case, these methods typically fail on use cases involving densely-annotated, busy images: images common in the wild where objects of relevance are not the central subject,… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: 9 pages, 1 figure. Submitted to 4th Workshop on Meta-Learning at NeurIPS 2020

  30. arXiv:2009.11253  [pdf, other

    cs.LG cs.AI cs.CV math.GN stat.ML

    Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task Generalization in Few-shot Learning

    Authors: Henry Kvinge, Zachary New, Nico Courts, Jung H. Lee, Lauren A. Phillips, Courtney D. Corley, Aaron Tuor, Andrew Avila, Nathan O. Hodas

    Abstract: Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Comments: 17 pages

  31. arXiv:1906.11818  [pdf, other

    eess.IV cs.CV eess.SP

    More chemical detection through less sampling: amplifying chemical signals in hyperspectral data cubes through compressive sensing

    Authors: Henry Kvinge, Elin Farnell, Julia R. Dupuis, Michael Kirby, Chris Peterson, Elizabeth C. Schundler

    Abstract: Compressive sensing (CS) is a method of sampling which permits some classes of signals to be reconstructed with high accuracy even when they were under-sampled. In this paper we explore a phenomenon in which bandwise CS sampling of a hyperspectral data cube followed by reconstruction can actually result in amplification of chemical signals contained in the cube. Perhaps most surprisingly, chemical… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: 10 pages

  32. arXiv:1906.08869  [pdf, other

    eess.SP cs.LG eess.IV

    A data-driven approach to sampling matrix selection for compressive sensing

    Authors: Elin Farnell, Henry Kvinge, John P. Dixon, Julia R. Dupuis, Michael Kirby, Chris Peterson, Elizabeth C. Schundler, Christian W. Smith

    Abstract: Sampling is a fundamental aspect of any implementation of compressive sensing. Typically, the choice of sampling method is guided by the reconstruction basis. However, this approach can be problematic with respect to certain hardware constraints and is not responsive to domain-specific context. We propose a method for defining an order for a sampling basis that is optimal with respect to capturing… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

    Comments: 15 pages

  33. arXiv:1901.10585  [pdf, other

    cs.LG cs.CV stat.ML

    Rare geometries: revealing rare categories via dimension-driven statistics

    Authors: Henry Kvinge, Elin Farnell, **gya Li, Yujia Chen

    Abstract: In many situations, classes of data points of primary interest also happen to be those that are least numerous. A well-known example is detection of fraudulent transactions among the collection of all financial transactions, the vast majority of which are legitimate. These types of problems fall under the label of `rare-category detection.' There are two challenging aspects of these problems. The… ▽ More

    Submitted 28 May, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: 9 pages. Section IV substantially expanded with minor improvements to other parts of the paper. Two new co-authors responsible for implementation of the algorithm on real data added

  34. arXiv:1812.03362  [pdf, other

    cs.LG math.CO math.GR math.RT stat.ML

    Multi-Dimensional Scaling on Groups

    Authors: Mark Blumstein, Henry Kvinge

    Abstract: Leveraging the intrinsic symmetries in data for clear and efficient analysis is an important theme in signal processing and other data-driven sciences. A basic example of this is the ubiquity of the discrete Fourier transform which arises from translational symmetry (i.e. time-delay/phase-shift). Particularly important in this area is understanding how symmetries inform the algorithms that we appl… ▽ More

    Submitted 14 January, 2020; v1 submitted 8 December, 2018; originally announced December 2018.

    Comments: Significantly refined presentation of content. Addition of connections to character theory. New more concise title and abstract. 6 pages

  35. arXiv:1810.11562  [pdf, other

    cs.LG cs.CV physics.data-an stat.ML

    Monitoring the shape of weather, soundscapes, and dynamical systems: a new statistic for dimension-driven data analysis on large data sets

    Authors: Henry Kvinge, Elin Farnell, Michael Kirby, Chris Peterson

    Abstract: Dimensionality-reduction methods are a fundamental tool in the analysis of large data sets. These algorithms work on the assumption that the "intrinsic dimension" of the data is generally much smaller than the ambient dimension in which it is collected. Alongside their usual purpose of map** data into a smaller dimension with minimal information loss, dimensionality-reduction techniques implicit… ▽ More

    Submitted 26 October, 2018; originally announced October 2018.

    Comments: Accepted to the 2018 IEEE International Conference on BIG DATA, 9 pages

  36. arXiv:1808.01686  [pdf, other

    cs.CV cs.LG eess.IV eess.SP

    Too many secants: a hierarchical approach to secant-based dimensionality reduction on large data sets

    Authors: Henry Kvinge, Elin Farnell, Michael Kirby, Chris Peterson

    Abstract: A fundamental question in many data analysis settings is the problem of discerning the "natural" dimension of a data set. That is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that manifold. Various approaches exist for estimating this dimension, such as the method of Secant-Avoidance Projection (SAP). Intuitively, the SAP algo… ▽ More

    Submitted 5 August, 2018; originally announced August 2018.

    Comments: To appear in the Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, Waltham, MA USA

  37. arXiv:1807.03425  [pdf, other

    cs.CV cs.LG eess.IV eess.SP

    A GPU-Oriented Algorithm Design for Secant-Based Dimensionality Reduction

    Authors: Henry Kvinge, Elin Farnell, Michael Kirby, Chris Peterson

    Abstract: Dimensionality-reduction techniques are a fundamental tool for extracting useful information from high-dimensional data sets. Because secant sets encode manifold geometry, they are a useful tool for designing meaningful data-reduction algorithms. In one such approach, the goal is to construct a projection that maximally avoids secant directions and hence ensures that distinct data points are not m… ▽ More

    Submitted 9 July, 2018; originally announced July 2018.

    Comments: To appear in the 17th IEEE International Symposium on Parallel and Distributed Computing, Geneva, Switzerland 2018

  38. arXiv:1807.01401  [pdf, other

    cs.CV cs.LG eess.IV eess.SP

    Endmember Extraction on the Grassmannian

    Authors: Elin Farnell, Henry Kvinge, Michael Kirby, Chris Peterson

    Abstract: Endmember extraction plays a prominent role in a variety of data analysis problems as endmembers often correspond to data representing the purest or best representative of some feature. Identifying endmembers then can be useful for further identification and classification tasks. In settings with high-dimensional data, such as hyperspectral imagery, it can be useful to consider endmembers that are… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

    Comments: To appear in Proceedings of the 2018 IEEE Data Science Workshop, Lausanne, Switzerland