-
AI and the EU Digital Markets Act: Addressing the Risks of Bigness in Generative AI
Authors:
Ayse Gizem Yasar,
Andrew Chong,
Evan Dong,
Thomas Krendl Gilbert,
Sarah Hladikova,
Roland Maio,
Carlos Mougan,
Xudong Shen,
Shubham Singh,
Ana-Andreea Stoica,
Savannah Thais,
Miri Zilka
Abstract:
As AI technology advances rapidly, concerns over the risks of bigness in digital markets are also growing. The EU's Digital Markets Act (DMA) aims to address these risks. Still, the current framework may not adequately cover generative AI systems that could become gateways for AI-based services. This paper argues for integrating certain AI software as core platform services and classifying certain…
▽ More
As AI technology advances rapidly, concerns over the risks of bigness in digital markets are also growing. The EU's Digital Markets Act (DMA) aims to address these risks. Still, the current framework may not adequately cover generative AI systems that could become gateways for AI-based services. This paper argues for integrating certain AI software as core platform services and classifying certain developers as gatekeepers under the DMA. We also propose an assessment of gatekeeper obligations to ensure they cover generative AI services. As the EU considers generative AI-specific rules and possible DMA amendments, this paper provides insights towards diversity and openness in generative AI services.
△ Less
Submitted 7 July, 2023;
originally announced August 2023.
-
Evaluating Language Models for Mathematics through Interactions
Authors:
Katherine M. Collins,
Albert Q. Jiang,
Simon Frieder,
Lionel Wong,
Miri Zilka,
Umang Bhatt,
Thomas Lukasiewicz,
Yuhuai Wu,
Joshua B. Tenenbaum,
William Hart,
Timothy Gowers,
Wenda Li,
Adrian Weller,
Mateja Jamnik
Abstract:
There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs, and is insufficient for making an informed decision about which LLMs and under which assistive settings can they be sensibly used. Static assessment fails to a…
▽ More
There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs, and is insufficient for making an informed decision about which LLMs and under which assistive settings can they be sensibly used. Static assessment fails to account for the essential interactive element in LLM deployment, and therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analysing MathConverse, we derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, amongst other findings. Further, we garner a more granular understanding of GPT-4 mathematical problem-solving through a series of case studies, contributed by expert mathematicians. We conclude with actionable takeaways for ML practitioners and mathematicians: models that communicate uncertainty respond well to user corrections, and are more interpretable and concise may constitute better assistants. Interactive evaluation is a promising way to navigate the capability of these models; humans should be aware of language models' algebraic fallibility and discern where they are appropriate to use.
△ Less
Submitted 5 November, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
The Progression of Disparities within the Criminal Justice System: Differential Enforcement and Risk Assessment Instruments
Authors:
Miri Zilka,
Riccardo Fogliato,
Jiri Hron,
Bradley Butcher,
Carolyn Ashurst,
Adrian Weller
Abstract:
Algorithmic risk assessment instruments (RAIs) increasingly inform decision-making in criminal justice. RAIs largely rely on arrest records as a proxy for underlying crime. Problematically, the extent to which arrests reflect overall offending can vary with the person's characteristics. We examine how the disconnect between crime and arrest rates impacts RAIs and their evaluation. Our main contrib…
▽ More
Algorithmic risk assessment instruments (RAIs) increasingly inform decision-making in criminal justice. RAIs largely rely on arrest records as a proxy for underlying crime. Problematically, the extent to which arrests reflect overall offending can vary with the person's characteristics. We examine how the disconnect between crime and arrest rates impacts RAIs and their evaluation. Our main contribution is a method for quantifying this bias via estimation of the amount of unobserved offenses associated with particular demographics. These unobserved offenses are then used to augment real-world arrest records to create part real, part synthetic crime records. Using this data, we estimate that four currently deployed RAIs assign 0.5--2.8 percentage points higher risk scores to Black individuals than to White individuals with a similar \emph{arrest} record, but the gap grows to 4.5--11.0 percentage points when we match on the semi-synthetic \emph{crime} record. We conclude by discussing the potential risks around the use of RAIs, highlighting how they may exacerbate existing inequalities if the underlying disparities of the criminal justice system are not taken into account. In light of our findings, we provide recommendations to improve the development and evaluation of such tools.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents
Authors:
Bradley Butcher,
Miri Zilka,
Darren Cook,
Jiri Hron,
Adrian Weller
Abstract:
While humans can extract information from unstructured text with high precision and recall, this is often too time-consuming to be practical. Automated approaches, on the other hand, produce nearly-immediate results, but may not be reliable enough for high-stakes applications where precision is essential. In this work, we consider the benefits and drawbacks of various human-only, human-machine, an…
▽ More
While humans can extract information from unstructured text with high precision and recall, this is often too time-consuming to be practical. Automated approaches, on the other hand, produce nearly-immediate results, but may not be reliable enough for high-stakes applications where precision is essential. In this work, we consider the benefits and drawbacks of various human-only, human-machine, and machine-only information extraction approaches. We argue for the utility of a human-in-the-loop approach in applications where high precision is required, but purely manual extraction is infeasible. We present a framework and an accompanying tool for information extraction using weak-supervision labelling with human validation. We demonstrate our approach on three criminal justice datasets. We find that the combination of computer speed and human understanding yields precision comparable to manual annotation while requiring only a fraction of time, and significantly outperforms fully automated baselines in terms of precision.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
Can We Automate the Analysis of Online Child Sexual Exploitation Discourse?
Authors:
Darren Cook,
Miri Zilka,
Heidi DeSandre,
Susan Giles,
Adrian Weller,
Simon Maskell
Abstract:
Social media's growing popularity raises concerns around children's online safety. Interactions between minors and adults with predatory intentions is a particularly grave concern. Research into online sexual grooming has often relied on domain experts to manually annotate conversations, limiting both scale and scope. In this work, we test how well-automated methods can detect conversational behav…
▽ More
Social media's growing popularity raises concerns around children's online safety. Interactions between minors and adults with predatory intentions is a particularly grave concern. Research into online sexual grooming has often relied on domain experts to manually annotate conversations, limiting both scale and scope. In this work, we test how well-automated methods can detect conversational behaviors and replace an expert human annotator. Informed by psychological theories of online grooming, we label $6772$ chat messages sent by child-sex offenders with one of eleven predatory behaviors. We train bag-of-words and natural language inference models to classify each behavior, and show that the best performing models classify behaviors in a manner that is consistent, but not on-par, with human annotation.
△ Less
Submitted 25 September, 2022;
originally announced September 2022.
-
Transparency, Governance and Regulation of Algorithmic Tools Deployed in the Criminal Justice System: a UK Case Study
Authors:
Miri Zilka,
Holli Sargeant,
Adrian Weller
Abstract:
We present a survey of tools used in the criminal justice system in the UK in three categories: data infrastructure, data analysis, and risk prediction. Many tools are currently in deployment, offering potential benefits, including improved efficiency and consistency. However, there are also important concerns. Transparent information about these tools, their purpose, how they are used, and by who…
▽ More
We present a survey of tools used in the criminal justice system in the UK in three categories: data infrastructure, data analysis, and risk prediction. Many tools are currently in deployment, offering potential benefits, including improved efficiency and consistency. However, there are also important concerns. Transparent information about these tools, their purpose, how they are used, and by whom is difficult to obtain. Even when information is available, it is often insufficient to enable a satisfactory evaluation. More work is needed to establish governance mechanisms to ensure that tools are deployed in a transparent, safe and ethical way. We call for more engagement with stakeholders and greater documentation of the intended goal of a tool, how it will achieve this goal compared to other options, and how it will be monitored in deployment. We highlight additional points to consider when evaluating the trustworthiness of deployed tools and make concrete proposals for policy.
△ Less
Submitted 1 June, 2022; v1 submitted 30 May, 2022;
originally announced May 2022.
-
Racial Disparities in the Enforcement of Marijuana Violations in the US
Authors:
Bradley Butcher,
Chris Robinson,
Miri Zilka,
Riccardo Fogliato,
Carolyn Ashurst,
Adrian Weller
Abstract:
Racial disparities in US drug arrest rates have been observed for decades, but their causes and policy implications are still contested. Some have argued that the disparities largely reflect differences in drug use between racial groups, while others have hypothesized that discriminatory enforcement policies and police practices play a significant role. In this work, we analyze racial disparities…
▽ More
Racial disparities in US drug arrest rates have been observed for decades, but their causes and policy implications are still contested. Some have argued that the disparities largely reflect differences in drug use between racial groups, while others have hypothesized that discriminatory enforcement policies and police practices play a significant role. In this work, we analyze racial disparities in the enforcement of marijuana violations in the US. Using data from the National Incident-Based Reporting System (NIBRS) and the National Survey on Drug Use and Health (NSDUH) programs, we investigate whether marijuana usage and purchasing behaviors can explain the racial composition of offenders in police records. We examine potential driving mechanisms behind these disparities and the extent to which county-level socioeconomic factors are associated with corresponding disparities. Our results indicate that the significant racial disparities in reported incidents and arrests cannot be explained by differences in marijuana days-of-use alone. Variations in the location where marijuana is purchased and in the frequency of these purchases partially explain the observed disparities. We observe an increase in racial disparities across most counties over the last decade, with the greatest increases in states that legalized the use of marijuana within this timeframe. Income, high school graduation rate, and rate of employment positively correlate with larger racial disparities, while the rate of incarceration is negatively correlated. We conclude with a discussion of the implications of the observed racial disparities in the context of algorithmic fairness.
△ Less
Submitted 1 June, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
The star formation history of the Milky Way's Nuclear Star Cluster
Authors:
O. Pfuhl,
T. K. Fritz,
M. Zilka,
H. Maness,
F. Eisenhauer,
R. Genzel,
S. Gillessen,
T. Ott,
K. Dodds-Eden,
A. Sternberg
Abstract:
We present spatially resolved imaging and integral field spectroscopy data for 450 cool giant stars within 1\,pc from Sgr\,A*. We use the prominent CO bandheads to derive effective temperatures of individual giants. Additionally we present the deepest spectroscopic observation of the Galactic Center so far, probing the number of B9/A0 main sequence stars ($2.2-2.8\,M_\odot$) in two deep fields. Fr…
▽ More
We present spatially resolved imaging and integral field spectroscopy data for 450 cool giant stars within 1\,pc from Sgr\,A*. We use the prominent CO bandheads to derive effective temperatures of individual giants. Additionally we present the deepest spectroscopic observation of the Galactic Center so far, probing the number of B9/A0 main sequence stars ($2.2-2.8\,M_\odot$) in two deep fields. From spectro-photometry we construct a Hertzsprung-Russell diagram of the red giant population and fit the observed diagram with model populations to derive the star formation history of the nuclear cluster.
We find that (1) the average nuclear star-formation rate dropped from an initial maximum $\sim10$\,Gyrs ago to a deep minimum 1-2\,Gyrs ago and increased again during the last few hundred Myrs, and (2) that roughly 80% of the stellar mass formed more than 5\,Gyrs ago; (3) mass estimates within $\rm R\sim1\,pc$ from Sgr\,A* favor a dominant star formation mode with a 'normal' Chabrier/Kroupa initial mass function for the majority of the past star formation in the Galactic Center. The bulk stellar mass seems to have formed under conditions significantly different from the young stellar disks, perhaps because at the time of the formation of the nuclear cluster the massive black hole and its sphere of influence was much smaller than today.
△ Less
Submitted 7 October, 2011;
originally announced October 2011.
-
An Extremely Top-Heavy IMF in the Galactic Center Stellar Disks
Authors:
H. Bartko,
F. Martins,
S. Trippe,
T. K. Fritz,
R. Genzel,
T. Ott,
F. Eisenhauer,
S. Gillessen,
T. Paumard,
T. Alexander,
K. Dodds-Eden,
O. Gerhard,
Y. Levin,
L. Mascetti,
S. Nayakshin,
H. B. Perets,
G. Perrin,
O. Pfuhl,
M. J. Reid,
D. Rouan,
M. Zilka,
A. Sternberg
Abstract:
We present new observations of the nuclear star cluster in the central parsec of the Galaxy with the adaptive optics assisted, integral field spectrograph SINFONI on the ESO/VLT. Our work allows the spectroscopic detection of early and late type stars to m_K >= 16, more than 2 magnitudes deeper than our previous data sets. Our observations result in a total sample of 177 bona fide early-type sta…
▽ More
We present new observations of the nuclear star cluster in the central parsec of the Galaxy with the adaptive optics assisted, integral field spectrograph SINFONI on the ESO/VLT. Our work allows the spectroscopic detection of early and late type stars to m_K >= 16, more than 2 magnitudes deeper than our previous data sets. Our observations result in a total sample of 177 bona fide early-type stars. We find that most of these Wolf Rayet (WR), O- and B- stars reside in two strongly warped disks between 0.8" and 12" from SgrA*, as well as a central compact concentration (the S-star cluster) centered on SgrA*. The later type B stars (m_K>15) in the radial interval between 0.8" and 12" seem to be in a more isotropic distribution outside the disks. The observed dearth of late type stars in the central few arcseconds is puzzling, even when allowing for stellar collisions. The stellar mass function of the disk stars is extremely top heavy with a best fit power law of dN/dm ~ m^(-0.45+/-0.3). Since at least the WR/O-stars were formed in situ in a single star formation event ~6 Myrs ago, this mass function probably reflects the initial mass function (IMF). The mass functions of the S-stars inside 0.8" and of the early-type stars at distances beyond 12" are compatible with a standard Salpeter/Kroupa IMF (best fit power law of dN/dm ~ m^(-2.15+/-0.3)).
△ Less
Submitted 10 November, 2009; v1 submitted 15 August, 2009;
originally announced August 2009.