Skip to main content

Showing 1–26 of 26 results for author: Wolfe, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.16820  [pdf, other

    cs.LG cs.AI cs.CY cs.HC

    Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings

    Authors: Robert Wolfe, Isaac Slaughter, Bin Han, Bingbing Wen, Yiwei Yang, Lucas Rosenblatt, Bernease Herman, Eva Brown, Zening Qu, Nic Weber, Bill Howe

    Abstract: The rapid proliferation of generative AI has raised questions about the competitiveness of lower-parameter, locally tunable, open-weight models relative to high-parameter, API-guarded, closed-weight models in terms of performance, domain adaptation, cost, and generalization. Centering under-resourced yet risk-intolerant settings in government, research, and healthcare, we see for-profit closed-wei… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted at the ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2024

  2. arXiv:2405.15985  [pdf, other

    cs.HC cs.AI cs.CY

    The Impact and Opportunities of Generative AI in Fact-Checking

    Authors: Robert Wolfe, Tanushree Mitra

    Abstract: Generative AI appears poised to transform white collar professions, with more than 90% of Fortune 500 companies using OpenAI's flagship GPT models, which have been characterized as "general purpose technologies" capable of effecting epochal changes in the economy. But how will such technologies impact organizations whose job is to verify and report factual information, and to ensure the health of… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: To be published at the ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2024

  3. Better Schedules for Low Precision Training of Deep Neural Networks

    Authors: Cameron R. Wolfe, Anastasios Kyrillidis

    Abstract: Low precision training can significantly reduce the computational overhead of training deep neural networks (DNNs). Though many such techniques exist, cyclic precision training (CPT), which dynamically adjusts precision throughout training according to a cyclic schedule, achieves particularly impressive improvements in training efficiency, while actually improving DNN performance. Existing CPT imp… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 20 pages, 8 figures, 1 table, ACML 2023

    ACM Class: I.2.6; I.2.10; I.4.0

    Journal ref: Machine Learning (2024): 1-19

  4. arXiv:2307.03360  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Evaluating Biased Attitude Associations of Language Models in an Intersectional Context

    Authors: Shiva Omrani Sabbaghi, Robert Wolfe, Aylin Caliskan

    Abstract: Language models are trained on large-scale corpora that embed implicit biases documented in psychology. Valence associations (pleasantness/unpleasantness) of social groups determine the biased attitudes towards groups and concepts in social cognition. Building on this established literature, we quantify how social groups are valenced in English language models using a sentence template that provid… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: to be published in AIES 2023

  5. arXiv:2212.11261  [pdf, other

    cs.CY cs.AI cs.CL cs.CV cs.LG

    Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias

    Authors: Robert Wolfe, Yiwei Yang, Bill Howe, Aylin Caliskan

    Abstract: Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics, such as emotions, are disregarded and the person is treated as a body. We replicate three experiments in psychology qua… ▽ More

    Submitted 15 May, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: 12 pages, 4 figures, 2 tables

    Journal ref: ACM FAccT 2023

  6. arXiv:2211.04624  [pdf, other

    cs.LG cs.CV math.OC

    Cold Start Streaming Learning for Deep Networks

    Authors: Cameron R. Wolfe, Anastasios Kyrillidis

    Abstract: The ability to dynamically adapt neural networks to newly-available data without performance deterioration would revolutionize deep learning applications. Streaming learning (i.e., learning from one data example at a time) has the potential to enable such real-time adaptation, but current approaches i) freeze a majority of network parameters during streaming and ii) are dependent upon offline, bas… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 52 pages, 7 figures, pre-print

    MSC Class: 68T07 ACM Class: I.2.6; I.2.10; I.4.0

  7. arXiv:2207.00691  [pdf, other

    cs.CY cs.AI cs.CL cs.CV cs.LG

    American == White in Multimodal Language-and-Image AI

    Authors: Robert Wolfe, Aylin Caliskan

    Abstract: Three state-of-the-art language-and-image AI models, CLIP, SLIP, and BLIP, are evaluated for evidence of a bias previously observed in social and experimental psychology: equating American identity with being White. Embedding association tests (EATs) using standardized images of self-identified Asian, Black, Latina/o, and White individuals from the Chicago Face Database (CFD) reveal that White ind… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted to AI Ethics and Society 2022

  8. arXiv:2206.03390  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics

    Authors: Aylin Caliskan, Pimparkar Parth Ajay, Tessa Charlesworth, Robert Wolfe, Mahzarin R. Banaji

    Abstract: The statistical regularities in language corpora encode well-known social biases into word embeddings. Here, we focus on gender to provide a comprehensive analysis of group-based biases in widely-used static English word embeddings trained on internet corpora (GloVe 2014, fastText 2017). Using the Single-Category Word Embedding Association Test, we demonstrate the widespread prevalence of gender b… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: 15 pages, 6 figures, accepted to AAAI/ACM Artificial Intelligence, Ethics, and Society

  9. arXiv:2205.12484  [pdf, other

    cs.CL cs.AI

    GisPy: A Tool for Measuring Gist Inference Score in Text

    Authors: Pedram Hosseini, Christopher R. Wolfe, Mona Diab, David A. Broniatowski

    Abstract: Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions. In this work, we delineate the process of develo** GisPy, an open-source tool in Python for measuring the Gist Inference Score (GIS) in text. Evaluation of GisPy on documents in three benchmarks from the news and scientific text domai… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted to the 4th Workshop on Narrative Understanding @ NAACL 2022

  10. arXiv:2205.11378  [pdf, other

    cs.CV cs.AI cs.CL cs.CY cs.LG

    Markedness in Visual Semantic AI

    Authors: Robert Wolfe, Aylin Caliskan

    Abstract: We evaluate the state-of-the-art multimodal "visual semantic" model CLIP ("Contrastive Language Image Pretraining") for biases related to the marking of age, gender, and race or ethnicity. Given the option to label an image as "a photo of a person" or to select a label denoting race or ethnicity, CLIP chooses the "person" label 47.9% of the time for White individuals, compared with 5.0% or less fo… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: To be published at ACM FAccT 2022

  11. arXiv:2205.10764  [pdf, other

    cs.CV cs.AI cs.CL cs.CY

    Evidence for Hypodescent in Visual Semantic AI

    Authors: Robert Wolfe, Mahzarin R. Banaji, Aylin Caliskan

    Abstract: We examine the state-of-the-art multimodal "visual semantic" model CLIP ("Contrastive Language Image Pretraining") for the rule of hypodescent, or one-drop rule, whereby multiracial people are more likely to be assigned a racial or ethnic label corresponding to a minority or disadvantaged racial or ethnic group than to the equivalent majority or advantaged group. A face morphing experiment grounde… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

    Comments: To be published at ACM FAccT 2022

  12. arXiv:2203.10428  [pdf, other

    cs.LG cs.AI

    PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication

    Authors: Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, Yingyan Lin

    Abstract: Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-structured data, and training large-scale GCNs requires distributed training across multiple accelerators such that each accelerator is able to hold a partitioned subgraph. However, distributed GCN training incurs prohibitive overhead of communicating node features and feature gradients among partitions for every… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: ICLR 2022

  13. arXiv:2203.07511  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Contrastive Visual Semantic Pretraining Magnifies the Semantics of Natural Language Representations

    Authors: Robert Wolfe, Aylin Caliskan

    Abstract: We examine the effects of contrastive visual semantic pretraining by comparing the geometry and semantic properties of contextualized English language representations formed by GPT-2 and CLIP, a zero-shot multimodal image classifier which adapts the GPT-2 architecture to encode image captions. We find that contrastive visual semantic pretraining significantly mitigates the anisotropy found in cont… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: To be published in ACL 2022

  14. arXiv:2203.07504  [pdf, ps, other

    cs.CL cs.AI cs.CY cs.LG

    VAST: The Valence-Assessing Semantics Test for Contextualizing Language Models

    Authors: Robert Wolfe, Aylin Caliskan

    Abstract: VAST, the Valence-Assessing Semantics Test, is a novel intrinsic evaluation task for contextualized word embeddings (CWEs). VAST uses valence, the association of a word with pleasantness, to measure the correspondence of word-level LM semantics with widely used human judgments, and examines the effects of contextualization, tokenization, and LM-specific geometry. Because prior research has found t… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: To be published in AAAI 2022

  15. arXiv:2112.04905  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

    Authors: Cameron R. Wolfe, Anastasios Kyrillidis

    Abstract: We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP. Inspired by ideas from sparse signal recovery, i-SpaSP operates by iteratively identifying a larger set of important parameter groups (e.g., filters or neurons) within a network that contribute most to the residual between pruned and dense network output, t… ▽ More

    Submitted 29 March, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

    Comments: 29 pages, 4 figures, 4th Annual Conference on Learning for Dynamics and Control

    MSC Class: 68T07 ACM Class: I.2.6; I.2.10; I.4.0

  16. arXiv:2110.00672  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models

    Authors: Robert Wolfe, Aylin Caliskan

    Abstract: We use a dataset of U.S. first names with labels based on predominant gender and racial group to examine the effect of training corpus frequency on tokenization, contextualization, similarity to initial representation, and bias in BERT, GPT-2, T5, and XLNet. We show that predominantly female and non-white names are less frequent in the training corpora of these four language models. We find that i… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Comments: 15 pages, 3 figures, 8 tables

    Journal ref: Empirical Methods in Natural Language Processing 2021

  17. arXiv:2108.00259  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    How much pre-training is enough to discover a good subnetwork?

    Authors: Cameron R. Wolfe, Fangshuo Liao, Qihan Wang, Junhyung Lyle Kim, Anastasios Kyrillidis

    Abstract: Neural network pruning is useful for discovering efficient, high-performing subnetworks within pre-trained, dense network architectures. More often than not, it involves a three-step process -- pre-training, pruning, and re-training -- that is computationally expensive, as the dense model must be fully pre-trained. While previous work has revealed through experiments the relationship between the a… ▽ More

    Submitted 22 August, 2023; v1 submitted 31 July, 2021; originally announced August 2021.

    Comments: 29 pages

    MSC Class: 68T07 ACM Class: I.2.6; I.2.10; I.4.0

  18. arXiv:2107.13054  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Exceeding the Limits of Visual-Linguistic Multi-Task Learning

    Authors: Cameron R. Wolfe, Keld T. Lundgaard

    Abstract: By leveraging large amounts of product data collected across hundreds of live e-commerce websites, we construct 1000 unique classification tasks that share similarly-structured input data, comprised of both text and images. These classification tasks focus on learning the product hierarchy of different e-commerce websites, causing many of them to be correlated. Adopting a multi-modal transformer m… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

    Comments: 10 pages, 7 figures

    MSC Class: 68T07 ACM Class: I.2.6; I.2.7; I.2.10

  19. arXiv:2107.04197  [pdf, other

    cs.LG

    REX: Revisiting Budgeted Training with an Improved Schedule

    Authors: John Chen, Cameron Wolfe, Anastasios Kyrillidis

    Abstract: Deep learning practitioners often operate on a computational and monetary budget. Thus, it is critical to design optimization algorithms that perform well under any budget. The linear learning rate schedule is considered the best budget-aware schedule, as it outperforms most other schedules in the low budget regime. On the other hand, learning rate schedules -- such as the \texttt{30-60-90} step s… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  20. arXiv:2107.00961  [pdf, other

    cs.LG cs.CV cs.DC math.OC

    ResIST: Layer-Wise Decomposition of ResNets for Distributed Training

    Authors: Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis

    Abstract: We propose ResIST, a novel distributed training protocol for Residual Networks (ResNets). ResIST randomly decomposes a global ResNet into several shallow sub-ResNets that are trained independently in a distributed manner for several local iterations, before having their updates synchronized and aggregated into the global model. In the next round, new sub-ResNets are randomly generated and the proc… ▽ More

    Submitted 14 March, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: 26 pages, 8 figures, pre-print under review

  21. arXiv:2106.08775  [pdf, other

    math.OC cs.IT cs.LG cs.MS stat.ML

    Momentum-inspired Low-Rank Coordinate Descent for Diagonally Constrained SDPs

    Authors: Junhyung Lyle Kim, JA Lara Benitez, Mohammad Taha Toghani, Cameron Wolfe, Zhiwei Zhang, Anastasios Kyrillidis

    Abstract: We present a novel, practical, and provable approach for solving diagonally constrained semi-definite programming (SDP) problems at scale using accelerated non-convex programming. Our algorithm non-trivially combines acceleration motions from convex optimization with coordinate power iteration and matrix factorization techniques. The algorithm is extremely simple to implement, and adds only a sing… ▽ More

    Submitted 2 July, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: 10 pages, 8 figures, preprint under review

    MSC Class: 49-02 ACM Class: F.2.1; G.4

  22. arXiv:2102.10424  [pdf, other

    cs.LG cs.AI cs.DC math.OC

    GIST: Distributed Training for Large-Scale Graph Convolutional Networks

    Authors: Cameron R. Wolfe, **gkang Yang, Arindam Chowdhury, Chen Dun, Artun Bayer, Santiago Segarra, Anastasios Kyrillidis

    Abstract: The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored training on large-scale graphs (e.g., GraphSAGE, ClusterGCN, etc.), we pioneer efficient training of large-scale GCN models (i.e., ultra-wide, overparameterized mo… ▽ More

    Submitted 14 March, 2022; v1 submitted 20 February, 2021; originally announced February 2021.

    Comments: 28 pages, 5 figures, pre-print under review

    ACM Class: I.2.4

  23. arXiv:1912.00772  [pdf, other

    cs.LG stat.ML

    E-Stitchup: Data Augmentation for Pre-Trained Embeddings

    Authors: Cameron R. Wolfe, Keld T. Lundgaard

    Abstract: In this work, we propose data augmentation methods for embeddings from pre-trained deep learning models that take a weighted combination of a pair of input embeddings, as inspired by Mixup, and combine such augmentation with extra label softening. These methods are shown to significantly increase classification accuracy, reduce training time, and improve confidence calibration of a downstream mode… ▽ More

    Submitted 6 October, 2020; v1 submitted 27 November, 2019; originally announced December 2019.

    Comments: 11 pages, 7 figures

  24. arXiv:1910.02120  [pdf, other

    cs.LG stat.ML

    Distributed Learning of Deep Neural Networks using Independent Subnet Training

    Authors: Binhang Yuan, Cameron R. Wolfe, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Christopher M. Jermaine

    Abstract: Distributed machine learning (ML) can bring more computational resources to bear than single-machine learning, thus enabling reductions in training time. Distributed learning partitions models and data over many machines, allowing model and dataset sizes beyond the available compute power and memory of a single machine. In practice though, distributed ML is challenging when distribution is mandato… ▽ More

    Submitted 18 April, 2022; v1 submitted 4 October, 2019; originally announced October 2019.

  25. arXiv:1903.10103  [pdf, other

    cs.NE

    Functional Generative Design of Mechanisms with Recurrent Neural Networks and Novelty Search

    Authors: Cameron R. Wolfe, Cem C. Tutum, Risto Miikkulainen

    Abstract: Consumer-grade 3D printers have made it easier to fabricate aesthetic objects and static assemblies, opening the door to automated design of such objects. However, while static designs are easily produced with 3D printing, functional designs with moving parts are more difficult to generate: The search space is too high-dimensional, the resolution of the 3D-printed parts is not adequate, and it is… ▽ More

    Submitted 24 March, 2019; originally announced March 2019.

    Comments: 7 pages, GECCO 2019

  26. arXiv:0710.4636  [pdf

    cs.AR

    Why Systems-on-Chip Needs More UML like a Hole in the Head

    Authors: Stephen J. Mellor, John R. Wolfe, Campbell Mccausland

    Abstract: Let's be clear from the outset: SoC can most certainly make use of UML; SoC just doesn't need more UML, or even all of it. The advent of model map**s, coupled with marks that indicate which map** rule to apply, enable a major simplification of the use of UML in SoC.

    Submitted 25 October, 2007; originally announced October 2007.

    Comments: Submitted on behalf of EDAA (http://www.edaa.com/)

    Journal ref: Dans Design, Automation and Test in Europe - DATE'05, Munich : Allemagne (2005)