-
Enantiospecificity in NMR Enabled by Chirality-Induced Spin Selectivity
Authors:
T. Georgiou,
J. L. Palma,
V. Mujica,
S. Varela,
M. Galante,
V. Santamarıa Garcıa,
L. Mboning,
R. N. Schwartz,
G. Cuniberti,
L. -S. Bouchard
Abstract:
Spin polarization in chiral molecules is a magnetic molecular response associated with electron transport and enantioselective bond polarization that occurs even in the absence of an external magnetic field. An unexpected finding by Santos and co-workers reported enantiospecific NMR responses in solid-state cross-polarization (CP) experiments, suggesting a possible additional contribution to the i…
▽ More
Spin polarization in chiral molecules is a magnetic molecular response associated with electron transport and enantioselective bond polarization that occurs even in the absence of an external magnetic field. An unexpected finding by Santos and co-workers reported enantiospecific NMR responses in solid-state cross-polarization (CP) experiments, suggesting a possible additional contribution to the indirect nuclear spin-spin coupling in chiral molecules induced by bond polarization in the presence of spin-orbit coupling. Herein we provide a theoretical treatment for this phenomenon, presenting an effective spin-Hamiltonian for helical molecules like DNA and density functional theory (DFT) results on amino acids that confirm the dependence of J-couplings on the choice of enantiomer. The connection between nuclear spin dynamics and chirality could offer insights for molecular sensing and quantum information sciences. These results establish NMR as a potential tool for chiral discrimination without external agents.
△ Less
Submitted 2 July, 2024; v1 submitted 30 June, 2024;
originally announced July 2024.
-
An Automated SQL Query Grading System Using An Attention-Based Convolutional Neural Network
Authors:
Donald R. Schwartz,
Pablo Rivas
Abstract:
Grading SQL queries can be a time-consuming, tedious and challenging task, especially as the number of student submissions increases. Several systems have been introduced in an attempt to mitigate these challenges, but those systems have their own limitations. This paper describes our novel approach to automating the process of grading SQL queries. Unlike previous approaches, we employ a unique co…
▽ More
Grading SQL queries can be a time-consuming, tedious and challenging task, especially as the number of student submissions increases. Several systems have been introduced in an attempt to mitigate these challenges, but those systems have their own limitations. This paper describes our novel approach to automating the process of grading SQL queries. Unlike previous approaches, we employ a unique convolutional neural network architecture that employs a parameter-sharing approach for different machine learning tasks that enables the architecture to induce different knowledge representations of the data to increase its potential for understanding SQL statements.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography
Authors:
Julia Yang,
Alina Jade Barnett,
Jon Donnelly,
Satvik Kishore,
Jerry Fang,
Fides Regina Schwartz,
Chaofan Chen,
Joseph Y. Lo,
Cynthia Rudin
Abstract:
Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency t…
▽ More
Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency to these formerly black boxes by utilizing prototypes for case-based explanations, achieving high accuracy in applications including mammography. However, these models struggle with precise feature localization, reasoning on large portions of an image when only a small part is relevant. This paper addresses this gap by proposing a novel multi-scale interpretable deep learning model for mammographic mass margin classification. Our contribution not only offers an interpretable model with reasoning aligned with radiologist practices, but also provides a general architecture for computer vision with user-configurable prototypes from coarse- to fine-grained prototypes.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
What Can Natural Language Processing Do for Peer Review?
Authors:
Ilia Kuznetsov,
Osama Mohammed Afzal,
Koen Dercksen,
Nils Dycke,
Alexander Goldberg,
Tom Hope,
Dirk Hovy,
Jonathan K. Kummerfeld,
Anne Lauscher,
Kevin Leyton-Brown,
Sheng Lu,
Mausam,
Margot Mieskes,
Aurélie Névéol,
Danish Pruthi,
Lizhen Qu,
Roy Schwartz,
Noah A. Smith,
Thamar Solorio,
**gyan Wang,
Xiaodan Zhu,
Anna Rogers,
Nihar B. Shah,
Iryna Gurevych
Abstract:
The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time…
▽ More
The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time-consuming, and prone to error. Since the artifacts involved in peer review -- manuscripts, reviews, discussions -- are largely text-based, Natural Language Processing has great potential to improve reviewing. As the emergence of large language models (LLMs) has enabled NLP assistance for many new tasks, the discussion on machine-assisted peer review is picking up the pace. Yet, where exactly is help needed, where can NLP help, and where should it stand aside? The goal of our paper is to provide a foundation for the future efforts in NLP for peer-reviewing assistance. We discuss peer review as a general process, exemplified by reviewing at AI conferences. We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance, illustrated by existing work. We then turn to the big challenges in NLP for peer review as a whole, including data acquisition and licensing, operationalization and experimentation, and ethical issues. To help consolidate community efforts, we create a companion repository that aggregates key datasets pertaining to peer review. Finally, we issue a detailed call for action for the scientific community, NLP and AI researchers, policymakers, and funding bodies to help bring the research in NLP for peer review forward. We hope that our work will help set the agenda for research in machine-assisted scientific quality control in the age of AI, within the NLP community and beyond.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models
Authors:
Jonathan Mamou,
Oren Pereg,
Daniel Korat,
Moshe Berchansky,
Nadav Timor,
Moshe Wasserblat,
Roy Schwartz
Abstract:
Speculative decoding is commonly used for reducing the inference latency of large language models. Its effectiveness depends highly on the speculation lookahead (SL)-the number of tokens generated by the draft model at each iteration. In this work we show that the common practice of using the same SL for all iterations (static SL) is suboptimal. We introduce DISCO (DynamIc SpeCulation lookahead Op…
▽ More
Speculative decoding is commonly used for reducing the inference latency of large language models. Its effectiveness depends highly on the speculation lookahead (SL)-the number of tokens generated by the draft model at each iteration. In this work we show that the common practice of using the same SL for all iterations (static SL) is suboptimal. We introduce DISCO (DynamIc SpeCulation lookahead Optimization), a novel method for dynamically selecting the SL. Our experiments with four datasets show that DISCO reaches an average speedup of 10% compared to the best static SL baseline, while generating the exact same text.
△ Less
Submitted 23 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Beyond Performance: Quantifying and Mitigating Label Bias in LLMs
Authors:
Yuval Reif,
Roy Schwartz
Abstract:
Large language models (LLMs) have shown remarkable adaptability to diverse tasks, by leveraging context prompts containing instructions, or minimal input-output examples. However, recent work revealed they also exhibit label bias -- an undesirable preference toward predicting certain answers over others. Still, detecting and measuring this bias reliably and at scale has remained relatively unexplo…
▽ More
Large language models (LLMs) have shown remarkable adaptability to diverse tasks, by leveraging context prompts containing instructions, or minimal input-output examples. However, recent work revealed they also exhibit label bias -- an undesirable preference toward predicting certain answers over others. Still, detecting and measuring this bias reliably and at scale has remained relatively unexplored. In this study, we evaluate different approaches to quantifying label bias in a model's predictions, conducting a comprehensive investigation across 279 classification tasks and ten LLMs. Our investigation reveals substantial label bias in models both before and after debiasing attempts, as well as highlights the importance of outcomes-based evaluation metrics, which were not previously used in this regard. We further propose a novel label bias calibration method tailored for few-shot prompting, which outperforms recent calibration approaches for both improving performance and mitigating label bias. Our results emphasize that label bias in the predictions of LLMs remains a barrier to their reliability.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Authors:
Michael Hassid,
Tal Remez,
Jonas Gehring,
Roy Schwartz,
Yossi Adi
Abstract:
It is a common belief that large language models (LLMs) are better than smaller-sized ones. However, larger models also require significantly more time and compute during inference. This begs the question: what happens when both models operate under the same budget? (e.g., compute, run-time). To address this question, we analyze code generation LLMs of various sizes and make comparisons such as ru…
▽ More
It is a common belief that large language models (LLMs) are better than smaller-sized ones. However, larger models also require significantly more time and compute during inference. This begs the question: what happens when both models operate under the same budget? (e.g., compute, run-time). To address this question, we analyze code generation LLMs of various sizes and make comparisons such as running a 70B model once vs. generating five outputs from a 13B model and selecting one. Our findings reveal that, in a standard unit-test setup, the repeated use of smaller models can yield consistent improvements, with gains of up to 15% across five tasks. On the other hand, in scenarios where unit-tests are unavailable, a ranking-based selection of candidates from the smaller model falls short of the performance of a single output from larger ones. Our results highlight the potential of using smaller models instead of larger ones, and the importance of studying approaches for ranking LLM outputs.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
The Flap** Birds in the Pentagram Zoo
Authors:
Richard Evan Schwartz
Abstract:
We study the $(k+1,k)$ diagonal map for $k=2,3,4,...$. We call this map $Δ_k$. The map $Δ_1$ is the pentagram map and $Δ_k$ is a generalization. $Δ_k$ does not preserve convexity, but we prove that $Δ_k$ preserves a subset $B_k$ of certain star-shaped polygons which we call $k$-birds. The action of $Δ_k$ on $B_k$ seems similar to the action of $Δ_1$ on the space of convex polygons. We show that so…
▽ More
We study the $(k+1,k)$ diagonal map for $k=2,3,4,...$. We call this map $Δ_k$. The map $Δ_1$ is the pentagram map and $Δ_k$ is a generalization. $Δ_k$ does not preserve convexity, but we prove that $Δ_k$ preserves a subset $B_k$ of certain star-shaped polygons which we call $k$-birds. The action of $Δ_k$ on $B_k$ seems similar to the action of $Δ_1$ on the space of convex polygons. We show that some classic geometric results about $Δ_1$ generalize to this setting.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
A New Database of Giant Impacts over a Wide Range of Masses and with Material Strength: A First Analysis of Outcomes
Authors:
Alexandre Emsenhuber,
Erik Asphaug,
Saverio Cambioni,
Travis S. J. Gabriel,
Stephen R. Schwartz,
Robert E. Melikyan,
C. Adeene Denton
Abstract:
In the late stage of terrestrial planet formation, planets are predicted to undergo pairwise collisions known as giant impacts. Here we present a high-resolution database of giant impacts for differentiated colliding bodies of iron-silicate composition, with target masses ranging from 10^-4 M_Earth up to super-Earths (5 M_Earth). We vary impactor-to-target mass ratio, core-mantle (iron-silicate) f…
▽ More
In the late stage of terrestrial planet formation, planets are predicted to undergo pairwise collisions known as giant impacts. Here we present a high-resolution database of giant impacts for differentiated colliding bodies of iron-silicate composition, with target masses ranging from 10^-4 M_Earth up to super-Earths (5 M_Earth). We vary impactor-to-target mass ratio, core-mantle (iron-silicate) fraction, impact velocity, and impact angle. Strength in the form of friction is included in all simulations. We find that due to strength, collisions with bodies smaller than about 2*10^-3 M_Earth can result in irregular shapes, compound core structures, and captured binaries. We observe that the characteristic esca** velocity of smaller remnants (debris) is approximately half of the impact velocity, significantly faster than currently assumed in N-body simulations of planet formation. Incorporating these results in N-body planet formation studies would provide more realistic debris-debris and debris-planet interactions.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Transformers are Multi-State RNNs
Authors:
Matanel Oren,
Michael Hassid,
Nir Yarden,
Yossi Adi,
Roy Schwartz
Abstract:
Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only transformers can in fact be conceptualized as unbounded multi-state RNNs - an RNN variant with unlimited hidden state size. We further show that transformers can be converted into $\textit{bounded}$ multi-s…
▽ More
Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only transformers can in fact be conceptualized as unbounded multi-state RNNs - an RNN variant with unlimited hidden state size. We further show that transformers can be converted into $\textit{bounded}$ multi-state RNNs by fixing the size of their hidden state, effectively compressing their key-value cache. We introduce a novel, training-free compression policy - $\textbf{T}$oken $\textbf{O}$mission $\textbf{V}$ia $\textbf{A}$ttention (TOVA). Our experiments with four long range tasks and several LLMs show that TOVA outperforms several baseline compression policies. Particularly, our results are nearly on par with the full model, using in some cases only $\frac{1}{8}$ of the original cache size, which translates to 4.8X higher throughput. Our results shed light on the connection between transformers and RNNs, and help mitigate one of LLMs' most painful computational bottlenecks - the size of their key-value cache. We publicly release our code at https://github.com/schwartz-lab-NLP/TOVA
△ Less
Submitted 18 June, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
The SARAO MeerKAT 1.3 GHz Galactic Plane Survey
Authors:
S. Goedhart,
W. D. Cotton,
F. Camilo,
M. A. Thompson,
G. Umana,
M. Bietenholz,
P. A. Woudt,
L. D. Anderson,
C. Bordiu,
D. A. H. Buckley,
C. S. Buemi,
F. Bufano,
F. Cavallaro,
H. Chen,
J. O. Chibueze,
D. Egbo,
B. S. Frank,
M. G. Hoare,
A. Ingallinera,
T. Irabor,
R. C. Kraan-Korteweg,
S. Kurapati,
P. Leto,
S. Loru,
M. Mutale
, et al. (105 additional authors not shown)
Abstract:
We present the SARAO MeerKAT Galactic Plane Survey (SMGPS), a 1.3 GHz continuum survey of almost half of the Galactic Plane (251°$\le l \le$ 358°and 2°$\le l \le$ 61°at $|b| \le 1.5°$). SMGPS is the largest, most sensitive and highest angular resolution 1 GHz survey of the Plane yet carried out, with an angular resolution of 8" and a broadband RMS sensitivity of $\sim$10--20 $μ$ Jy/beam. Here we d…
▽ More
We present the SARAO MeerKAT Galactic Plane Survey (SMGPS), a 1.3 GHz continuum survey of almost half of the Galactic Plane (251°$\le l \le$ 358°and 2°$\le l \le$ 61°at $|b| \le 1.5°$). SMGPS is the largest, most sensitive and highest angular resolution 1 GHz survey of the Plane yet carried out, with an angular resolution of 8" and a broadband RMS sensitivity of $\sim$10--20 $μ$ Jy/beam. Here we describe the first publicly available data release from SMGPS which comprises data cubes of frequency-resolved images over 908--1656 MHz, power law fits to the images, and broadband zeroth moment integrated intensity images. A thorough assessment of the data quality and guidance for future usage of the data products are given. Finally, we discuss the tremendous potential of SMGPS by showcasing highlights of the Galactic and extragalactic science that it permits. These highlights include the discovery of a new population of non-thermal radio filaments; identification of new candidate supernova remnants, pulsar wind nebulae and planetary nebulae; improved radio/mid-IR classification of rare Luminous Blue Variables and discovery of associated extended radio nebulae; new radio stars identified by Bayesian cross-matching techniques; the realisation that many of the largest radio-quiet WISE HII region candidates are not true HII regions; and a large sample of previously undiscovered background HI galaxies in the Zone of Avoidance.
△ Less
Submitted 2 May, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
On Approximating Cutwidth and Pathwidth
Authors:
Nikhil Bansal,
Dor Katzelnick,
Roy Schwartz
Abstract:
We study graph ordering problems with a min-max objective. A classical problem of this type is cutwidth, where given a graph we want to order its vertices such that the number of edges crossing any point is minimized. We give a $ \log^{1+o(1)}(n)$ approximation for the problem, substantially improving upon the previous poly-logarithmic guarantees based on the standard recursive balanced partitioni…
▽ More
We study graph ordering problems with a min-max objective. A classical problem of this type is cutwidth, where given a graph we want to order its vertices such that the number of edges crossing any point is minimized. We give a $ \log^{1+o(1)}(n)$ approximation for the problem, substantially improving upon the previous poly-logarithmic guarantees based on the standard recursive balanced partitioning approach of Leighton and Rao (FOCS'88). Our key idea is a new metric decomposition procedure that is suitable for handling min-max objectives, which could be of independent interest. We also use this to show other results, including an improved $ \log^{1+o(1)}(n)$ approximation for computing the pathwidth of a graph.
△ Less
Submitted 12 April, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
The Potential of Wearable Sensors for Assessing Patient Acuity in Intensive Care Unit (ICU)
Authors:
Jessica Sena,
Mohammad Tahsin Mostafiz,
Jiaqing Zhang,
Andrea Davidson,
Sabyasachi Bandyopadhyay,
Ren Yuanfang,
Tezcan Ozrazgat-Baslanti,
Benjamin Shickel,
Tyler Loftus,
William Robson Schwartz,
Azra Bihorac,
Parisa Rashidi
Abstract:
Acuity assessments are vital in critical care settings to provide timely interventions and fair resource allocation. Traditional acuity scores rely on manual assessments and documentation of physiological states, which can be time-consuming, intermittent, and difficult to use for healthcare providers. Furthermore, such scores do not incorporate granular information such as patients' mobility level…
▽ More
Acuity assessments are vital in critical care settings to provide timely interventions and fair resource allocation. Traditional acuity scores rely on manual assessments and documentation of physiological states, which can be time-consuming, intermittent, and difficult to use for healthcare providers. Furthermore, such scores do not incorporate granular information such as patients' mobility level, which can indicate recovery or deterioration in the ICU. We hypothesized that existing acuity scores could be potentially improved by employing Artificial Intelligence (AI) techniques in conjunction with Electronic Health Records (EHR) and wearable sensor data. In this study, we evaluated the impact of integrating mobility data collected from wrist-worn accelerometers with clinical data obtained from EHR for develo** an AI-driven acuity assessment score. Accelerometry data were collected from 86 patients wearing accelerometers on their wrists in an academic hospital setting. The data was analyzed using five deep neural network models: VGG, ResNet, MobileNet, SqueezeNet, and a custom Transformer network. These models outperformed a rule-based clinical score (SOFA= Sequential Organ Failure Assessment) used as a baseline, particularly regarding the precision, sensitivity, and F1 score. The results showed that while a model relying solely on accelerometer data achieved limited performance (AUC 0.50, Precision 0.61, and F1-score 0.68), including demographic information with the accelerometer data led to a notable enhancement in performance (AUC 0.69, Precision 0.75, and F1-score 0.67). This work shows that the combination of mobility and patient information can successfully differentiate between stable and unstable states in critically ill patients.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Open-Set Face Recognition with Maximal Entropy and Objectosphere Loss
Authors:
Rafael Henrique Vareto,
Yu Linghu,
Terrance E. Boult,
William Robson Schwartz,
Manuel Günther
Abstract:
Open-set face recognition characterizes a scenario where unknown individuals, unseen during the training and enrollment stages, appear on operation time. This work concentrates on watchlists, an open-set task that is expected to operate at a low False Positive Identification Rate and generally includes only a few enrollment samples per identity. We introduce a compact adapter network that benefits…
▽ More
Open-set face recognition characterizes a scenario where unknown individuals, unseen during the training and enrollment stages, appear on operation time. This work concentrates on watchlists, an open-set task that is expected to operate at a low False Positive Identification Rate and generally includes only a few enrollment samples per identity. We introduce a compact adapter network that benefits from additional negative face images when combined with distinct cost functions, such as Objectosphere Loss (OS) and the proposed Maximal Entropy Loss (MEL). MEL modifies the traditional Cross-Entropy loss in favor of increasing the entropy for negative samples and attaches a penalty to known target classes in pursuance of gallery specialization. The proposed approach adopts pre-trained deep neural networks (DNNs) for face recognition as feature extractors. Then, the adapter network takes deep feature representations and acts as a substitute for the output layer of the pre-trained DNN in exchange for an agile domain adaptation. Promising results have been achieved following open-set protocols for three different datasets: LFW, IJB-C, and UCCS as well as state-of-the-art performance when supplementary negative data is properly selected to fine-tune the adapter network.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition
Authors:
Isaac Slaughter,
Craig Greenberg,
Reva Schwartz,
Aylin Caliskan
Abstract:
Previous work has established that a person's demographics and speech style affect how well speech processing models perform for them. But where does this bias come from? In this work, we present the Speech Embedding Association Test (SpEAT), a method for detecting bias in one type of model used for many speech tasks: pre-trained models. The SpEAT is inspired by word embedding association tests in…
▽ More
Previous work has established that a person's demographics and speech style affect how well speech processing models perform for them. But where does this bias come from? In this work, we present the Speech Embedding Association Test (SpEAT), a method for detecting bias in one type of model used for many speech tasks: pre-trained models. The SpEAT is inspired by word embedding association tests in natural language processing, which quantify intrinsic bias in a model's representations of different concepts, such as race or valence (something's pleasantness or unpleasantness) and capture the extent to which a model trained on large-scale socio-cultural data has learned human-like biases. Using the SpEAT, we test for six types of bias in 16 English speech models (including 4 models also trained on multilingual data), which come from the wav2vec 2.0, HuBERT, WavLM, and Whisper model families. We find that 14 or more models reveal positive valence (pleasantness) associations with abled people over disabled people, with European-Americans over African-Americans, with females over males, with U.S. accented speakers over non-U.S. accented speakers, and with younger people over older people. Beyond establishing that pre-trained speech models contain these biases, we also show that they can have real world effects. We compare biases found in pre-trained models to biases in downstream models adapted to the task of Speech Emotion Recognition (SER) and find that in 66 of the 96 tests performed (69%), the group that is more associated with positive valence as indicated by the SpEAT also tends to be predicted as speaking with higher valence by the downstream model. Our work provides evidence that, like text and image-based models, pre-trained speech based-models frequently learn human-like biases. Our work also shows that bias found in pre-trained models can propagate to the downstream task of SER.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
The Crisscross and the Cup: Two Short 3-Twist Paper Moebius Bands
Authors:
Brienne Elisabeth Brown,
Richard Evan Schwartz
Abstract:
We introduce the crisscross and the cup, both of which are immersed $3$-twist polygonal paper Moebius band of aspect ratio $3$. We explain why these two objects are limits of smooth embedded paper Moebius bands having knotted boundary. We conjecture that any smooth embedded paper Moebius band with knotted boundary has aspect ratio greater than $3$. The crisscross is planar but the cup is not.
We introduce the crisscross and the cup, both of which are immersed $3$-twist polygonal paper Moebius band of aspect ratio $3$. We explain why these two objects are limits of smooth embedded paper Moebius bands having knotted boundary. We conjecture that any smooth embedded paper Moebius band with knotted boundary has aspect ratio greater than $3$. The crisscross is planar but the cup is not.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
The Optimal Twisted Paper Cylinder
Authors:
Richard Evan Schwartz
Abstract:
A smooth twisted paper cylinder of aspect ratio $λ$ is an isometric embedding of a $1 \times λ$ cylinder into $\pmb{R}^3$ such that the images of the boundary components are linked. We prove that for such an object to exist we must have $λ>2$ and that this bound is sharp. We also show that any sequence of examples having aspect ratio converging to $2$ must converge, up to isometries, to a certain…
▽ More
A smooth twisted paper cylinder of aspect ratio $λ$ is an isometric embedding of a $1 \times λ$ cylinder into $\pmb{R}^3$ such that the images of the boundary components are linked. We prove that for such an object to exist we must have $λ>2$ and that this bound is sharp. We also show that any sequence of examples having aspect ratio converging to $2$ must converge, up to isometries, to a certain $4$-fold wrap** of a right-angled isosceles triangle.
△ Less
Submitted 14 October, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Rapid spin depolarization in the layered 2D Ruddlesden Popper perovskite (BA)(MA)PbI
Authors:
Michael Kempf,
Philipp Moser,
Maximilian Tomoscheit,
Julian Schröer,
Jean-Christophe Blancon,
Rico Schwartz,
Swarup Deb,
Aditya Mohite,
Andreas V. Stier,
Jonathan J. Finley,
Tobias Korn
Abstract:
We report temperature-dependent spectroscopy on the layered (n=4) two-dimensional (2D) Ruddlesden-Popper perovskite (BA)(MA)PbI. Helicity-resolved steady-state photoluminescence (PL) reveals no optical degree of polarization. Time-resolved PL shows a photocarrier lifetime on the order of nanoseconds. From simultaneaously recorded time-resolved differential reflectivity (TR$Δ$R) and time-resolved K…
▽ More
We report temperature-dependent spectroscopy on the layered (n=4) two-dimensional (2D) Ruddlesden-Popper perovskite (BA)(MA)PbI. Helicity-resolved steady-state photoluminescence (PL) reveals no optical degree of polarization. Time-resolved PL shows a photocarrier lifetime on the order of nanoseconds. From simultaneaously recorded time-resolved differential reflectivity (TR$Δ$R) and time-resolved Kerr ellipticity (TRKE), a photocarrier lifetime of a few nanoseconds and a spin dephasing time on the order of picoseconds was found. This stark contrast in lifetimes clearly explains the lack of spin polarization in steady-state PL. While we observe clear temperature-dependent effects on the PL dynamics that can be related to structural dynamics, the spin dephasing is nearly T-independent. Our results highlight that spin dephasing in 2D (BA)(MA)PbI occurs at time scales faster than the exciton recombination time, which poses a bottleneck for applications aimingto utilize this degree of freedom.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Towards Trustworthy Artificial Intelligence for Equitable Global Health
Authors:
Hong Qin,
Jude Kong,
Wandi Ding,
Ramneek Ahluwalia,
Christo El Morr,
Zeynep Engin,
Jake Okechukwu Effoduh,
Rebecca Hwa,
Serena **gchuan Guo,
Laleh Seyyed-Kalantari,
Sylvia Kiwuwa Muyingo,
Candace Makeda Moore,
Ravi Parikh,
Reva Schwartz,
Dongxiao Zhu,
Xiaoqian Wang,
Yiye Zhang
Abstract:
Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a glob…
▽ More
Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a global mix of experts from various disciplines, community health practitioners, policymakers, and more. Topics covered included managing AI bias in socio-technical systems, AI's potential impacts on global health, and balancing data privacy with transparency. Panel discussions examined the cultural, political, and ethical dimensions of AI in global health. FairMI4GH aimed to stimulate dialogue, facilitate knowledge transfer, and spark innovative solutions. Drawing from NIST's AI Risk Management Framework, it provided suggestions for handling AI risks and biases. The need to mitigate data biases from the research design stage, adopt a human-centered approach, and advocate for AI transparency was recognized. Challenges such as updating legal frameworks, managing cross-border data sharing, and motivating developers to reduce bias were acknowledged. The event emphasized the necessity of diverse viewpoints and multi-dimensional dialogue for creating a fair and ethical AI framework for equitable global health.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
The Optimal Paper Moebius Band
Authors:
Richard Evan Schwartz
Abstract:
In this paper we prove that a smooth embedded paper Moebius band must have aspect ratio greater than $\sqrt 3$. We also prove that any sequence of smooth embedded paper Moebius bands whose aspect ratio converges to $\sqrt 3$ must converge, up to isometry, to the famous triangular Moebius band. These results answer the minimum aspect ratio question discussed by W. Wunderlich in 1962 and prove the m…
▽ More
In this paper we prove that a smooth embedded paper Moebius band must have aspect ratio greater than $\sqrt 3$. We also prove that any sequence of smooth embedded paper Moebius bands whose aspect ratio converges to $\sqrt 3$ must converge, up to isometry, to the famous triangular Moebius band. These results answer the minimum aspect ratio question discussed by W. Wunderlich in 1962 and prove the more specific conjecture of B Halpern and C. Weaver from 1977.
△ Less
Submitted 21 May, 2024; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Open-set Face Recognition with Neural Ensemble, Maximal Entropy Loss and Feature Augmentation
Authors:
Rafael Henrique Vareto,
Manuel Günther,
William Robson Schwartz
Abstract:
Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of inter…
▽ More
Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of interest. As a response, this work introduces a novel method that associates an ensemble of compact neural networks with a margin-based cost function that explores additional samples. Supplementary negative samples can be obtained from external databases or synthetically built at the representation level in training time with a new mix-up feature augmentation approach. Deep neural networks pre-trained on large face datasets serve as the preliminary feature extraction module. We carry out experiments on well-known LFW and IJB-C datasets where results show that the approach is able to boost closed and open-set identification rates.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
A Tight Competitive Ratio for Online Submodular Welfare Maximization
Authors:
Amit Ganz,
Pranav Nuti,
Roy Schwartz
Abstract:
In this paper we consider the online Submodular Welfare (SW) problem. In this problem we are given $n$ bidders each equipped with a general (not necessarily monotone) submodular utility and $m$ items that arrive online. The goal is to assign each item, once it arrives, to a bidder or discard it, while maximizing the sum of utilities. When an adversary determines the items' arrival order we present…
▽ More
In this paper we consider the online Submodular Welfare (SW) problem. In this problem we are given $n$ bidders each equipped with a general (not necessarily monotone) submodular utility and $m$ items that arrive online. The goal is to assign each item, once it arrives, to a bidder or discard it, while maximizing the sum of utilities. When an adversary determines the items' arrival order we present a simple randomized algorithm that achieves a tight competitive ratio of $\nicefrac{1}{4}$. The algorithm is a specialization of an algorithm due to [Harshaw-Kazemi-Feldman-Karbasi MOR`22], who presented the previously best known competitive ratio of $3-2\sqrt{2}\approx 0.171573 $ to the problem. When the items' arrival order is uniformly random, we present a competitive ratio of $\approx 0.27493$, improving the previously known $\nicefrac{1}{4}$ guarantee. Our approach for the latter result is based on a better analysis of the (offline) Residual Random Greedy (RRG) algorithm of [Buchbinder-Feldman-Naor-Schwartz SODA`14], which we believe might be of independent interest.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Open-set Face Recognition using Ensembles trained on Clustered Data
Authors:
Rafael Henrique Vareto,
William Robson Schwartz
Abstract:
Open-set face recognition describes a scenario where unknown subjects, unseen during the training stage, appear on test time. Not only it requires methods that accurately identify individuals of interest, but also demands approaches that effectively deal with unfamiliar faces. This work details a scalable open-set face identification approach to galleries composed of hundreds and thousands of subj…
▽ More
Open-set face recognition describes a scenario where unknown subjects, unseen during the training stage, appear on test time. Not only it requires methods that accurately identify individuals of interest, but also demands approaches that effectively deal with unfamiliar faces. This work details a scalable open-set face identification approach to galleries composed of hundreds and thousands of subjects. It is composed of clustering and an ensemble of binary learning algorithms that estimates when query face samples belong to the face gallery and then retrieves their correct identity. The approach selects the most suitable gallery subjects and uses the ensemble to improve prediction performance. We carry out experiments on well-known LFW and YTF benchmarks. Results show that competitive performance can be achieved even when targeting scalability.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
An Improved Approximation Algorithm for the Max-$3$-Section Problem
Authors:
Dor Katzelnick,
Aditya Pillai,
Roy Schwartz,
Mohit Singh
Abstract:
We consider the Max-$3$-Section problem, where we are given an undirected graph $ G=(V,E)$ equipped with non-negative edge weights $w :E\rightarrow \mathbb{R}_+$ and the goal is to find a partition of $V$ into three equisized parts while maximizing the total weight of edges crossing between different parts. Max-$3$-Section is closely related to other well-studied graph partitioning problems, e.g.,…
▽ More
We consider the Max-$3$-Section problem, where we are given an undirected graph $ G=(V,E)$ equipped with non-negative edge weights $w :E\rightarrow \mathbb{R}_+$ and the goal is to find a partition of $V$ into three equisized parts while maximizing the total weight of edges crossing between different parts. Max-$3$-Section is closely related to other well-studied graph partitioning problems, e.g., Max-$k$-Cut, Max-$3$-Cut, and Max-Bisection. We present a polynomial time algorithm achieving an approximation of $ 0.795$, that improves upon the previous best known approximation of $ 0.673$. The requirement of multiple parts that have equal sizes renders Max-$3$-Section much harder to cope with compared to, e.g., Max-Bisection. We show a new algorithm that combines the existing approach of Lassere hierarchy along with a random cut strategy that suffices to give our result.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Symplectic Tiling Billiards, Planar Linkages, and Hyperbolic Geometry
Authors:
Richard Evan Schwartz
Abstract:
The purpose of this paper is to unite two games, symplectic billiards and tiling billiards. The new game is called symplectic tiling billiards. I will prove a result about periodic orbits of symplectic tiling billiards in a very special case and then show how this result is related to planar linkages and hyperbolic geometry.
The purpose of this paper is to unite two games, symplectic billiards and tiling billiards. The new game is called symplectic tiling billiards. I will prove a result about periodic orbits of symplectic tiling billiards in a very special case and then show how this result is related to planar linkages and hyperbolic geometry.
△ Less
Submitted 4 November, 2023; v1 submitted 23 July, 2023;
originally announced July 2023.
-
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Authors:
Netta Madvil,
Yonatan Bitton,
Roy Schwartz
Abstract:
The prevalence of large-scale multimodal datasets presents unique challenges in assessing dataset quality. We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it. Our method sheds light on the importance of different modalities in datasets, as well as the relationship bet…
▽ More
The prevalence of large-scale multimodal datasets presents unique challenges in assessing dataset quality. We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it. Our method sheds light on the importance of different modalities in datasets, as well as the relationship between them. We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality. Moreover, we find that more than 70% of the questions are solvable using several different single-modality strategies, e.g., by either looking at the video or listening to the audio, highlighting the limited integration of multiple modalities in TVQA. We leverage our annotation and analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification. Based on our observations, we introduce a new test set that necessitates multiple modalities, observing a dramatic drop in model performance. Our methodology provides valuable insights into multimodal datasets and highlights the need for the development of more robust models.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research
Authors:
Ji-Ung Lee,
Haritz Puerto,
Betty van Aken,
Yuki Arase,
Jessica Zosa Forde,
Leon Derczynski,
Andreas Rücklé,
Iryna Gurevych,
Roy Schwartz,
Emma Strubell,
Jesse Dodge
Abstract:
Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of parameters. Large model sizes makes computational cost one of the main limiting factors for training and evaluating such models; and has raised severe concerns about the sustainability, reproducibility, and inclusiveness for researching PLMs. These concerns are often based…
▽ More
Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of parameters. Large model sizes makes computational cost one of the main limiting factors for training and evaluating such models; and has raised severe concerns about the sustainability, reproducibility, and inclusiveness for researching PLMs. These concerns are often based on personal experiences and observations. However, there had not been any large-scale surveys that investigate them. In this work, we provide a first attempt to quantify these concerns regarding three topics, namely, environmental impact, equity, and impact on peer reviewing. By conducting a survey with 312 participants from the NLP community, we capture existing (dis)parities between different and within groups with respect to seniority, academia, and industry; and their impact on the peer reviewing process. For each topic, we provide an analysis and devise recommendations to mitigate found disparities, some of which already successfully implemented. Finally, we discuss additional concerns raised by many participants in free-text responses.
△ Less
Submitted 9 November, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Morphosyntactic probing of multilingual BERT models
Authors:
Judit Acs,
Endre Hamerlik,
Roy Schwartz,
Noah A. Smith,
Andras Kornai
Abstract:
We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain st…
▽ More
We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain strong performance across these tasks. We then apply two methods to locate, for each probing task, where the disambiguating information resides in the input. The first is a new perturbation method that masks various parts of context; the second is the classical method of Shapley values. The most intriguing finding that emerges is a strong tendency for the preceding context to hold more information relevant to the prediction than the following context.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings
Authors:
Daniel Rotem,
Michael Hassid,
Jonathan Mamou,
Roy Schwartz
Abstract:
Adaptive inference is a simple method for reducing inference costs. The method works by maintaining multiple classifiers of different capacities, and allocating resources to each test instance according to its difficulty. In this work, we compare the two main approaches for adaptive inference, Early-Exit and Multi-Model, when training data is limited. First, we observe that for models with the sam…
▽ More
Adaptive inference is a simple method for reducing inference costs. The method works by maintaining multiple classifiers of different capacities, and allocating resources to each test instance according to its difficulty. In this work, we compare the two main approaches for adaptive inference, Early-Exit and Multi-Model, when training data is limited. First, we observe that for models with the same architecture and size, individual Multi-Model classifiers outperform their Early-Exit counterparts by an average of 2.3%. We show that this gap is caused by Early-Exit classifiers sharing model parameters during training, resulting in conflicting gradient updates of model weights. We find that despite this gap, Early-Exit still provides a better speed-accuracy trade-off due to the overhead of the Multi-Model approach. To address these issues, we propose SWEET (Separating Weights in Early Exit Transformers), an Early-Exit fine-tuning method that assigns each classifier its own set of unique model weights, not updated by other classifiers. We compare SWEET's speed-accuracy curve to standard Early-Exit and Multi-Model baselines and find that it outperforms both methods at fast speeds while maintaining comparable scores to Early-Exit at slow speeds. Moreover, SWEET individual classifiers outperform Early-Exit ones by 1.1% on average. SWEET enjoys the benefits of both methods, paving the way for further reduction of inference costs in NLP.
△ Less
Submitted 4 June, 2023;
originally announced June 2023.
-
Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases
Authors:
Yuval Reif,
Roy Schwartz
Abstract:
NLP models often rely on superficial cues known as dataset biases to achieve impressive performance, and can fail on examples where these biases do not hold. Recent work sought to develop robust, unbiased models by filtering biased examples from training sets. In this work, we argue that such filtering can obscure the true capabilities of models to overcome biases, which might never be removed in…
▽ More
NLP models often rely on superficial cues known as dataset biases to achieve impressive performance, and can fail on examples where these biases do not hold. Recent work sought to develop robust, unbiased models by filtering biased examples from training sets. In this work, we argue that such filtering can obscure the true capabilities of models to overcome biases, which might never be removed in full from the dataset. We suggest that in order to drive the development of models robust to subtle biases, dataset biases should be amplified in the training set. We introduce an evaluation framework defined by a bias-amplified training set and an anti-biased test set, both automatically extracted from existing datasets. Experiments across three notions of bias, four datasets and two models show that our framework is substantially more challenging for models than the original data splits, and even more challenging than hand-crafted challenge sets. Our evaluation framework can use any existing dataset, even those considered obsolete, to test model robustness. We hope our work will guide the development of robust models that do not rely on superficial biases and correlations. To this end, we publicly release our code and data.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers
Authors:
Valfride Nascimento,
Rayson Laroca,
Jorge de A. Lambert,
William Robson Schwartz,
David Menotti
Abstract:
Recent years have seen significant developments in the field of License Plate Recognition (LPR) through the integration of deep learning techniques and the increasing availability of training data. Nevertheless, reconstructing license plates (LPs) from low-resolution (LR) surveillance footage remains challenging. To address this issue, we introduce a Single-Image Super-Resolution (SISR) approach t…
▽ More
Recent years have seen significant developments in the field of License Plate Recognition (LPR) through the integration of deep learning techniques and the increasing availability of training data. Nevertheless, reconstructing license plates (LPs) from low-resolution (LR) surveillance footage remains challenging. To address this issue, we introduce a Single-Image Super-Resolution (SISR) approach that integrates attention and transformer modules to enhance the detection of structural and textural features in LR images. Our approach incorporates sub-pixel convolution layers (also known as PixelShuffle) and a loss function that uses an Optical Character Recognition (OCR) model for feature extraction. We trained the proposed architecture on synthetic images created by applying heavy Gaussian noise to high-resolution LP images from two public datasets, followed by bicubic downsampling. As a result, the generated images have a Structural Similarity Index Measure (SSIM) of less than 0.10. Our results show that our approach for reconstructing these low-resolution synthesized images outperforms existing ones in both quantitative and qualitative measures. Our code is publicly available at https://github.com/valfride/lpr-rsr-ext/
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Textually Pretrained Speech Language Models
Authors:
Michael Hassid,
Tal Remez,
Tu Anh Nguyen,
Itai Gat,
Alexis Conneau,
Felix Kreuk,
Jade Copet,
Alexandre Defossez,
Gabriel Synnaeve,
Emmanuel Dupoux,
Roy Schwartz,
Yossi Adi
Abstract:
Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model de…
▽ More
Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model design choices such as the speech tokenizer, the pretrained textual model, and the dataset size. We find that model and dataset scale both play an important role in constructing better-performing SpeechLMs. Based on our observations, we present the largest (to the best of our knowledge) SpeechLM both in terms of number of parameters and training data. We additionally introduce two spoken versions of the StoryCloze textual benchmark to further improve model evaluation and advance future research in the field. We make speech samples, code and models publicly available: https://pages.cs.huji.ac.il/adiyoss-lab/twist/ .
△ Less
Submitted 30 January, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
MCTrans++: A 0-D Model for Centrifugal Mirrors
Authors:
Nick R. Schwartz,
Ian G. Abel,
Adil B. Hassam,
Myles Kelly,
Carlos A. Romero-Talamas
Abstract:
The centrifugal mirror confinement scheme incorporates supersonic rotation of a plasma into a magnetic mirror device. This concept has been shown experimentally to drastically decrease parallel losses and increase plasma stability as compared to prior axisymmetric mirrors. MCTrans++ is a 0D sco** tool which rapidly models experimental operating points in the Centrifugal Mirror Fusion Experiment…
▽ More
The centrifugal mirror confinement scheme incorporates supersonic rotation of a plasma into a magnetic mirror device. This concept has been shown experimentally to drastically decrease parallel losses and increase plasma stability as compared to prior axisymmetric mirrors. MCTrans++ is a 0D sco** tool which rapidly models experimental operating points in the Centrifugal Mirror Fusion Experiment (CMFX) at the University of Maryland. In the low-collisionality regime, parallel losses can be modeled analytically. A confining potential is set up that is partially ambipolar and partially centrifugal. Due to the stabilizing effects of flow-shear, the perpendicular losses can be modeled as classical. Radiation losses such as Bremsstrahlung and cyclotron emission are taken into account. A neutrals model is included, and, in some circumstances, charge-exchange losses are found to exceed all other loss mechanisms. We use the SUNDIALS ARKODE library to solve the underlying equations of this model; the resulting software is suitable for scanning large parameter spaces, and can also be used to model time-dependent phenomena such as a capacitive discharge. MCTrans++ has been used to verify results from prior centrifugal mirrors, create an experimental plan for CMFX, and find configurations for future reactor-scale fusion devices.
△ Less
Submitted 11 March, 2024; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Authors:
Nitzan Bitton-Guetta,
Yonatan Bitton,
Jack Hessel,
Ludwig Schmidt,
Yuval Elovici,
Gabriel Stanovsky,
Roy Schwartz
Abstract:
Weird, unusual, and uncanny images pique the curiosity of observers because they challenge commonsense. For example, an image released during the 2022 world cup depicts the famous soccer stars Lionel Messi and Cristiano Ronaldo playing chess, which playfully violates our expectation that their competition should occur on the football field. Humans can easily recognize and interpret these unconvent…
▽ More
Weird, unusual, and uncanny images pique the curiosity of observers because they challenge commonsense. For example, an image released during the 2022 world cup depicts the famous soccer stars Lionel Messi and Cristiano Ronaldo playing chess, which playfully violates our expectation that their competition should occur on the football field. Humans can easily recognize and interpret these unconventional images, but can AI models do the same? We introduce WHOOPS!, a new dataset and benchmark for visual commonsense. The dataset is comprised of purposefully commonsense-defying images created by designers using publicly-available image generation tools like Midjourney. We consider several tasks posed over the dataset. In addition to image captioning, cross-modal matching, and visual question answering, we introduce a difficult explanation generation task, where models must identify and explain why a given image is unusual. Our results show that state-of-the-art models such as GPT3 and BLIP2 still lag behind human performance on WHOOPS!. We hope our dataset will inspire the development of AI models with stronger visual commonsense reasoning abilities. Data, models and code are available at the project website: whoops-benchmark.github.io
△ Less
Submitted 12 August, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Quantum Gates Between Mesoscopic Spin Ensembles
Authors:
Mohamad Niknam,
Robert N. Schwartz,
Louis-S. Bouchard
Abstract:
Quantum algorithmics with single spins poses serious technological challenges such as precision fabrication, rapid decoherence, atomic-scale addressing and readout. To circumvent atomic-scale challenges, we examine the case of fully polarized mesoscopic spin ensembles (spin-coherent states) whose total angular momenta states map to qudit submanifolds. We show that in the limit where the size of th…
▽ More
Quantum algorithmics with single spins poses serious technological challenges such as precision fabrication, rapid decoherence, atomic-scale addressing and readout. To circumvent atomic-scale challenges, we examine the case of fully polarized mesoscopic spin ensembles (spin-coherent states) whose total angular momenta states map to qudit submanifolds. We show that in the limit where the size of the ensembles is small compared to their separation, it is possible to treat them as qubits with an effective coupling strength that scales with the number of spins. If the spins within each ensemble are decoupled (e.g., via control fields, spinning or diffusional averaging or materials engineering), one- and two-qubit gate operations can be implemented with high fidelities.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Successful Kinetic Impact into an Asteroid for Planetary Defense
Authors:
R. Terik Daly,
Carolyn M. Ernst,
Olivier S. Barnouin,
Nancy L. Chabot,
Andrew S. Rivkin,
Andrew F. Cheng,
Elena Y. Adams,
Harrison F. Agrusa,
Elisabeth D. Abel,
Amy L. Alford,
Erik I. Asphaug,
Justin A. Atchison,
Andrew R. Badger,
Paul Baki,
Ronald-L. Ballouz,
Dmitriy L. Bekker,
Julie Bellerose,
Shyam Bhaskaran,
Bonnie J. Buratti,
Saverio Cambioni,
Michelle H. Chen,
Steven R. Chesley,
George Chiu,
Gareth S. Collins,
Matthew W. Cox
, et al. (76 additional authors not shown)
Abstract:
While no known asteroid poses a threat to Earth for at least the next century, the catalog of near-Earth asteroids is incomplete for objects whose impacts would produce regional devastation. Several approaches have been proposed to potentially prevent an asteroid impact with Earth by deflecting or disrupting an asteroid. A test of kinetic impact technology was identified as the highest priority sp…
▽ More
While no known asteroid poses a threat to Earth for at least the next century, the catalog of near-Earth asteroids is incomplete for objects whose impacts would produce regional devastation. Several approaches have been proposed to potentially prevent an asteroid impact with Earth by deflecting or disrupting an asteroid. A test of kinetic impact technology was identified as the highest priority space mission related to asteroid mitigation. NASA's Double Asteroid Redirection Test (DART) mission is the first full-scale test of kinetic impact technology. The mission's target asteroid was Dimorphos, the secondary member of the S-type binary near-Earth asteroid (65803) Didymos. This binary asteroid system was chosen to enable ground-based telescopes to quantify the asteroid deflection caused by DART's impact. While past missions have utilized impactors to investigate the properties of small bodies those earlier missions were not intended to deflect their targets and did not achieve measurable deflections. Here we report the DART spacecraft's autonomous kinetic impact into Dimorphos and reconstruct the impact event, including the timeline leading to impact, the location and nature of the DART impact site, and the size and shape of Dimorphos. The successful impact of the DART spacecraft with Dimorphos and the resulting change in Dimorphos's orbit demonstrates that kinetic impactor technology is a viable technique to potentially defend Earth if necessary.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Ejecta from the DART-produced active asteroid Dimorphos
Authors:
Jian-Yang Li,
Masatoshi Hirabayashi,
Tony L. Farnham,
Jessica M. Sunshine,
Matthew M. Knight,
Gonzalo Tancredi,
Fernando Moreno,
Brian Murphy,
Cyrielle Opitom,
Steve Chesley,
Daniel J. Scheeres,
Cristina A. Thomas,
Eugene G. Fahnestock,
Andrew F. Cheng,
Linda Dressel,
Carolyn M. Ernst,
Fabio Ferrari,
Alan Fitzsimmons,
Simone Ieva,
Stavro L. Ivanovski,
Teddy Kareta,
Ludmilla Kolokolova,
Tim Lister,
Sabina D. Raducan,
Andrew S. Rivkin
, et al. (39 additional authors not shown)
Abstract:
Some active asteroids have been proposed to be the result of impact events. Because active asteroids are generally discovered serendipitously only after their tail formation, the process of the impact ejecta evolving into a tail has never been directly observed. NASA's Double Asteroid Redirection Test (DART) mission, apart from having successfully changed the orbital period of Dimorphos, demonstra…
▽ More
Some active asteroids have been proposed to be the result of impact events. Because active asteroids are generally discovered serendipitously only after their tail formation, the process of the impact ejecta evolving into a tail has never been directly observed. NASA's Double Asteroid Redirection Test (DART) mission, apart from having successfully changed the orbital period of Dimorphos, demonstrated the activation process of an asteroid from an impact under precisely known impact conditions. Here we report the observations of the DART impact ejecta with the Hubble Space Telescope (HST) from impact time T+15 minutes to T+18.5 days at spatial resolutions of ~2.1 km per pixel. Our observations reveal a complex evolution of ejecta, which is first dominated by the gravitational interaction between the Didymos binary system and the ejected dust and later by solar radiation pressure. The lowest-speed ejecta dispersed via a sustained tail that displayed a consistent morphology with previously observed asteroid tails thought to be produced by impact. The ejecta evolution following DART's controlled impact experiment thus provides a framework for understanding the fundamental mechanisms acting on asteroids disrupted by natural impact.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Divide and Conquer: A Distributed Approach to Five Point Energy Minimization
Authors:
Richard Evan Schwartz
Abstract:
This work rigorously verifies the phase transition in 5-point energy minimization first observed by Melnyk-Knop-Smith in 1977. More precisely, we prove that there is a constant S = [15+24/512,15+25/512] such that the triangular bi-pyramid is the energy minimizer with respect to the s-power law potential for all s in (0,S) and some pyramid with square base is the unique minimizer for all s in (S,15…
▽ More
This work rigorously verifies the phase transition in 5-point energy minimization first observed by Melnyk-Knop-Smith in 1977. More precisely, we prove that there is a constant S = [15+24/512,15+25/512] such that the triangular bi-pyramid is the energy minimizer with respect to the s-power law potential for all s in (0,S) and some pyramid with square base is the unique minimizer for all s in (S,15+512/25]. Taking s=1 gives another solution to Thomson's 5 electron problem from 1904.
The work here is a simplification of my monograph from 6 years ago which also proved the main result. The version here is half as long as the original. Also, I wrote it in an experimental style which facilitates the verification process. As I explicitly indicate in the text, the proof can be broken up into 10 independent pieces, all less than 15 pages, which can be checked independently of the other pieces.
△ Less
Submitted 22 January, 2023; v1 submitted 12 January, 2023;
originally announced January 2023.
-
VASR: Visual Analogies of Situation Recognition
Authors:
Yonatan Bitton,
Ron Yosef,
Eli Strugo,
Dafna Shahaf,
Roy Schwartz,
Gabriel Stanovsky
Abstract:
A core process in human cognition is analogical map**: the ability to identify a similar relational structure between different situations. We introduce a novel task, Visual Analogies of Situation Recognition, adapting the classical word-analogy task into the visual domain. Given a triplet of images, the task is to select an image candidate B' that completes the analogy (A to A' is like B to wha…
▽ More
A core process in human cognition is analogical map**: the ability to identify a similar relational structure between different situations. We introduce a novel task, Visual Analogies of Situation Recognition, adapting the classical word-analogy task into the visual domain. Given a triplet of images, the task is to select an image candidate B' that completes the analogy (A to A' is like B to what?). Unlike previous work on visual analogy that focused on simple image transformations, we tackle complex analogies requiring understanding of scenes.
We leverage situation recognition annotations and the CLIP model to generate a large set of 500k candidate analogies. Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label ~80% of the time (chance level 25%). Furthermore, we use human annotations to create a gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly (~86%), but struggle with carefully chosen distractors (~53%, compared to 90% human accuracy). We hope our dataset will encourage the development of new analogy-making models. Website: https://vasr-dataset.github.io/
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
Authors:
Michael Hassid,
Hao Peng,
Daniel Rotem,
Jungo Kasai,
Ivan Montero,
Noah A. Smith,
Roy Schwartz
Abstract:
The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by computing input-specific attention matrices. We find that this mechanism, while powerful and elegant, is not as important as typically thought for pretrained language models. We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with…
▽ More
The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by computing input-specific attention matrices. We find that this mechanism, while powerful and elegant, is not as important as typically thought for pretrained language models. We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones -- the average attention weights over multiple inputs. We use PAPA to analyze several established pretrained Transformers on six downstream tasks. We find that without any input-dependent attention, all models achieve competitive performance -- an average relative drop of only 8% from the probing baseline. Further, little or no performance drop is observed when replacing half of the input-dependent attention matrices with constant (input-independent) ones. Interestingly, we show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success. Our results motivate research on simpler alternatives to input-dependent attention, as well as on methods for better utilization of this mechanism in the Transformer architecture.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Combining Attention Module and Pixel Shuffle for License Plate Super-Resolution
Authors:
Valfride Nascimento,
Rayson Laroca,
Jorge de A. Lambert,
William Robson Schwartz,
David Menotti
Abstract:
The License Plate Recognition (LPR) field has made impressive advances in the last decade due to novel deep learning approaches combined with the increased availability of training data. However, it still has some open issues, especially when the data come from low-resolution (LR) and low-quality images/videos, as in surveillance systems. This work focuses on license plate (LP) reconstruction in L…
▽ More
The License Plate Recognition (LPR) field has made impressive advances in the last decade due to novel deep learning approaches combined with the increased availability of training data. However, it still has some open issues, especially when the data come from low-resolution (LR) and low-quality images/videos, as in surveillance systems. This work focuses on license plate (LP) reconstruction in LR and low-quality images. We present a Single-Image Super-Resolution (SISR) approach that extends the attention/transformer module concept by exploiting the capabilities of PixelShuffle layers and that has an improved loss function based on LPR predictions. For training the proposed architecture, we use synthetic images generated by applying heavy Gaussian noise in terms of Structural Similarity Index Measure (SSIM) to the original high-resolution (HR) images. In our experiments, the proposed method outperformed the baselines both quantitatively and qualitatively. The datasets we created for this work are publicly available to the research community at https://github.com/valfride/lpr-rsr/
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
After DART: Using the first full-scale test of a kinetic impactor to inform a future planetary defense mission
Authors:
Thomas S. Statler,
Sabina D. Raducan,
Olivier S. Barnouin,
Mallory E. DeCoster,
Steven R. Chesley,
Brent Barbee,
Harrison F. Agrusa,
Saverio Cambioni,
Andrew F. Cheng,
Elisabetta Dotto,
Siegfried Eggl,
Eugene G. Fahnestock,
Fabio Ferrari,
Dawn Graninger,
Alain Herique,
Isabel Herreros,
Masatoshi Hirabayashi,
Stavro Ivanovski,
Martin Jutzi,
Özgür Karatekin,
Alice Lucchetti,
Robert Luther,
Rahil Makadia,
Francesco Marzari,
Patrick Michel
, et al. (16 additional authors not shown)
Abstract:
NASA's Double Asteroid Redirection Test (DART) is the first full-scale test of an asteroid deflection technology. Results from the hypervelocity kinetic impact and Earth-based observations, coupled with LICIACube and the later Hera mission, will result in measurement of the momentum transfer efficiency accurate to ~10% and characterization of the Didymos binary system. But DART is a single experim…
▽ More
NASA's Double Asteroid Redirection Test (DART) is the first full-scale test of an asteroid deflection technology. Results from the hypervelocity kinetic impact and Earth-based observations, coupled with LICIACube and the later Hera mission, will result in measurement of the momentum transfer efficiency accurate to ~10% and characterization of the Didymos binary system. But DART is a single experiment; how could these results be used in a future planetary defense necessity involving a different asteroid? We examine what aspects of Dimorphos's response to kinetic impact will be constrained by DART results; how these constraints will help refine knowledge of the physical properties of asteroidal materials and predictive power of impact simulations; what information about a potential Earth impactor could be acquired before a deflection effort; and how design of a deflection mission should be informed by this understanding. We generalize the momentum enhancement factor $β$, showing that a particular direction-specific $β$ will be directly determined by the DART results, and that a related direction-specific $β$ is a figure of merit for a kinetic impact mission. The DART $β$ determination constrains the ejecta momentum vector, which, with hydrodynamic simulations, constrains the physical properties of Dimorphos's near-surface. In a hypothetical planetary defense exigency, extrapolating these constraints to a newly discovered asteroid will require Earth-based observations and benefit from in-situ reconnaissance. We show representative predictions for momentum transfer based on different levels of reconnaissance and discuss strategic targeting to optimize the deflection and reduce the risk of a counterproductive deflection in the wrong direction.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Effects of impact and target parameters on the results of a kinetic impactor: predictions for the Double Asteroid Redirection Test (DART) mission
Authors:
Angela M. Stickle,
Mallory E. DeCoster,
Christoph Burger,
Wendy K. Caldwell,
Dawn Graninger,
Kathryn M. Kumamoto,
Robert Luther,
Jens Ormö,
Sabina Raducan,
Emma Rainey,
Christoph M. Schäfer,
James D. Walker,
Yun Zhang,
Patrick Michel,
J. Michael Owen,
Olivier Barnouin,
Andy F. Cheng,
Sidney Cochron,
Gareth S. Collins,
Thomas M. Davison,
Elisabetta Dotto,
Fabio Ferrari,
M. Isabel Herreros,
Stavro L. Ivanovski,
Martin Jutzi
, et al. (8 additional authors not shown)
Abstract:
The Double Asteroid Redirection Test (DART) spacecraft will impact into the asteroid Dimorphos on September 26, 2022 as a test of the kinetic impactor technique for planetary defense. The efficiency of the deflection following a kinetic impactor can be represented using the momentum enhancement factor, Beta, which is dependent on factors such as impact geometry and the specific target material pro…
▽ More
The Double Asteroid Redirection Test (DART) spacecraft will impact into the asteroid Dimorphos on September 26, 2022 as a test of the kinetic impactor technique for planetary defense. The efficiency of the deflection following a kinetic impactor can be represented using the momentum enhancement factor, Beta, which is dependent on factors such as impact geometry and the specific target material properties. Currently, very little is known about Dimorphos and its material properties that introduces uncertainty in the results of the deflection efficiency observables, including crater formation, ejecta distribution, and Beta. The DART Impact Modeling Working Group (IWG) is responsible for using impact simulations to better understand the results of the DART impact. Pre-impact simulation studies also provide considerable insight into how different properties and impact scenarios affect momentum enhancement following a kinetic impact. This insight provides a basis for predicting the effects of the DART impact and the first understanding of how to interpret results following the encounter. Following the DART impact, the knowledge gained from these studies will inform the initial simulations that will recreate the impact conditions, including providing estimates for potential material properties of Dimorphos and Beta resulting from DARTs impact. This paper summarizes, at a high level, what has been learned from the IWG simulations and experiments in preparation for the DART impact. While unknown, estimates for reasonable potential material properties of Dimorphos provide predictions for Beta of 1-5, depending on end-member cases in the strength regime.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Efficient Methods for Natural Language Processing: A Survey
Authors:
Marcos Treviso,
Ji-Ung Lee,
Tianchu Ji,
Betty van Aken,
Qingqing Cao,
Manuel R. Ciosici,
Michael Hassid,
Kenneth Heafield,
Sara Hooker,
Colin Raffel,
Pedro H. Martins,
André F. T. Martins,
Jessica Zosa Forde,
Peter Milder,
Edwin Simpson,
Noam Slonim,
Jesse Dodge,
Emma Strubell,
Niranjan Balasubramanian,
Leon Derczynski,
Iryna Gurevych,
Roy Schwartz
Abstract:
Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require few…
▽ More
Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for develo** more efficient methods.
△ Less
Submitted 24 March, 2023; v1 submitted 31 August, 2022;
originally announced September 2022.
-
Continued Fractions and the 4-Color Theorem
Authors:
Richard Evan Schwartz
Abstract:
We study the geometry of some proper 4-colorings of the vertices of sphere triangulations with degree sequence 6,...,6,2,2,2. Such triangulations are the simplest examples which have non-negative combinatorial curvature. The examples we construct, which are roughly extremal in some sense, are based on a novel geometric interpretation of continued fractions. We also present a conjectural sharp "iso…
▽ More
We study the geometry of some proper 4-colorings of the vertices of sphere triangulations with degree sequence 6,...,6,2,2,2. Such triangulations are the simplest examples which have non-negative combinatorial curvature. The examples we construct, which are roughly extremal in some sense, are based on a novel geometric interpretation of continued fractions. We also present a conjectural sharp "isoperimetric inequality" for colorings of this kind of triangulation.
△ Less
Submitted 29 December, 2023; v1 submitted 10 August, 2022;
originally announced August 2022.
-
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
Authors:
Yonatan Bitton,
Nitzan Bitton Guetta,
Ron Yosef,
Yuval Elovici,
Mohit Bansal,
Gabriel Stanovsky,
Roy Schwartz
Abstract:
While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills. In this work, we introduce WinoGAViL: an online game of vision-and-language associations (e.g., between werewolves and a full moon), used as a dynamic evaluation benchmark. Inspired by the popular card game Codenames, a spymaster gives a…
▽ More
While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills. In this work, we introduce WinoGAViL: an online game of vision-and-language associations (e.g., between werewolves and a full moon), used as a dynamic evaluation benchmark. Inspired by the popular card game Codenames, a spymaster gives a textual cue related to several visual candidates, and another player tries to identify them. Human players are rewarded for creating associations that are challenging for a rival AI model but still solvable by other human players. We use the game to collect 3.5K instances, finding that they are intuitive for humans (>90% Jaccard index) but challenging for state-of-the-art AI models, where the best model (ViLT) achieves a score of 52%, succeeding mostly where the cue is visually salient. Our analysis as well as the feedback we collect from players indicate that the collected associations require diverse reasoning skills, including general knowledge, common sense, abstraction, and more. We release the dataset, the code and the interactive game, allowing future data collection that can be used to develop models with better association abilities.
△ Less
Submitted 11 October, 2022; v1 submitted 25 July, 2022;
originally announced July 2022.
-
Predictions for the Dynamical States of the Didymos System before and after the Planned DART Impact
Authors:
Derek C. Richardson,
Harrison F. Agrusa,
Brent Barbee,
William F. Bottke,
Andrew F. Cheng,
Siegfried Eggl,
Fabio Ferrari,
Masatoshi Hirabayashi,
Özgür Karatekin,
Jay McMahon,
Stephen R. Schwartz,
Ronald-Louis Ballouz,
Adriano Campo Bagatin,
Elisabetta Dotto,
Eugene G. Fahnestock,
Oscar Fuentes-Muñoz,
Ioannis Gkolias,
Douglas P. Hamilton,
Seth A. Jacobson,
Martin Jutzi,
Josh Lyzhoft,
Rahil Makadia,
Alex J. Meyer,
Patrick Michel,
Ryota Nakano
, et al. (11 additional authors not shown)
Abstract:
NASA's Double Asteroid Redirection Test (DART) spacecraft is planned to impact the natural satellite of (65803) Didymos, Dimorphos, around 23:14 UTC on 26 September 2022, causing a reduction in its orbital period that will be measurable with ground-based observations. This test of kinetic impactor technology will provide the first estimate of the momentum transfer enhancement factor $β$ at a reali…
▽ More
NASA's Double Asteroid Redirection Test (DART) spacecraft is planned to impact the natural satellite of (65803) Didymos, Dimorphos, around 23:14 UTC on 26 September 2022, causing a reduction in its orbital period that will be measurable with ground-based observations. This test of kinetic impactor technology will provide the first estimate of the momentum transfer enhancement factor $β$ at a realistic scale, wherein ejecta from the impact provides an additional deflection to the target. Earth-based observations, the LICIACube spacecraft (to be detached from DART prior to impact), and ESA's follow-up Hera mission to launch in 2024, will provide additional characterization of the deflection test. Together Hera and DART comprise the Asteroid Impact and Deflection Assessment (AIDA) cooperation between NASA and ESA. Here the predicted dynamical states of the binary system upon arrival and after impact are presented. The assumed dynamically relaxed state of the system will be excited by the impact, leading to an increase in eccentricity and slight tilt of the orbit together with enhanced libration of Dimorphos with amplitude dependent on the currently poorly known target shape. Free rotation around the moon's long axis may also be triggered and the orbital period will experience variations from seconds to minutes over timescales of days to months. Shape change of either body due to cratering or mass wasting triggered by crater formation and ejecta may affect $β$ but can be constrained through additional measurements. Both BYORP and gravity tides may cause measurable orbital changes on the timescale of Hera's rendezvous.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias
Authors:
Yarden Tal,
Inbal Magar,
Roy Schwartz
Abstract:
The size of pretrained models is increasing, and so is their performance on a variety of NLP tasks. However, as their memorization capacity grows, they might pick up more social biases. In this work, we examine the connection between model size and its gender bias (specifically, occupational gender bias). We measure bias in three masked language model families (RoBERTa, DeBERTa, and T5) in two set…
▽ More
The size of pretrained models is increasing, and so is their performance on a variety of NLP tasks. However, as their memorization capacity grows, they might pick up more social biases. In this work, we examine the connection between model size and its gender bias (specifically, occupational gender bias). We measure bias in three masked language model families (RoBERTa, DeBERTa, and T5) in two setups: directly using prompt based method, and using a downstream task (Winogender). We find on the one hand that larger models receive higher bias scores on the former task, but when evaluated on the latter, they make fewer gender errors. To examine these potentially conflicting results, we carefully investigate the behavior of the different models on Winogender. We find that while larger models outperform smaller ones, the probability that their mistakes are caused by gender bias is higher. Moreover, we find that the proportion of stereotypical errors compared to anti-stereotypical ones grows with the model size. Our findings highlight the potential risks that can arise from increasing model size.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
Measuring the Carbon Intensity of AI in Cloud Instances
Authors:
Jesse Dodge,
Taylor Prewitt,
Remi Tachet Des Combes,
Erika Odmark,
Roy Schwartz,
Emma Strubell,
Alexandra Sasha Luccioni,
Noah A. Smith,
Nicole DeCario,
Will Buchanan
Abstract:
By providing unprecedented access to computational resources, cloud computing has enabled rapid growth in technologies such as machine learning, the computational demands of which incur a high energy cost and a commensurate carbon footprint. As a result, recent scholarship has called for better estimates of the greenhouse gas impact of AI: data scientists today do not have easy or reliable access…
▽ More
By providing unprecedented access to computational resources, cloud computing has enabled rapid growth in technologies such as machine learning, the computational demands of which incur a high energy cost and a commensurate carbon footprint. As a result, recent scholarship has called for better estimates of the greenhouse gas impact of AI: data scientists today do not have easy or reliable access to measurements of this information, precluding development of actionable tactics. Cloud providers presenting information about software carbon intensity to users is a fundamental step** stone towards minimizing emissions. In this paper, we provide a framework for measuring software carbon intensity, and propose to measure operational carbon emissions by using location-based and time-specific marginal emissions data per energy unit. We provide measurements of operational software carbon intensity for a set of modern models for natural language processing and computer vision, and a wide range of model sizes, including pretraining of a 6.1 billion parameter language model. We then evaluate a suite of approaches for reducing emissions on the Microsoft Azure cloud compute platform: using cloud instances in different geographic regions, using cloud instances at different times of day, and dynamically pausing cloud instances when the marginal carbon intensity is above a certain threshold. We confirm previous results that the geographic region of the data center plays a significant role in the carbon intensity for a given cloud instance, and find that choosing an appropriate region can have the largest operational emissions reduction impact. We also show that the time of day has notable impact on operational software carbon intensity. Finally, we conclude with recommendations for how machine learning practitioners can use software carbon intensity information to reduce environmental impact.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Trisecting the 9-vertex complex projective plane
Authors:
Richard Evan Schwartz
Abstract:
In this paper we will give a short and direct proof that Wolfgang Kuehnel's 9-vertex triangulation of the complex projective plane really is the complex projective plane. The idea of our proof is to recall the trisection of the complex projective plane into 3 bi-disks and then to see this trisection inside a symmetry-breaking subdivision of the triangulation. Following the basic proof, we will ela…
▽ More
In this paper we will give a short and direct proof that Wolfgang Kuehnel's 9-vertex triangulation of the complex projective plane really is the complex projective plane. The idea of our proof is to recall the trisection of the complex projective plane into 3 bi-disks and then to see this trisection inside a symmetry-breaking subdivision of the triangulation. Following the basic proof, we will elaborate on the construction.
△ Less
Submitted 4 July, 2022; v1 submitted 1 May, 2022;
originally announced May 2022.