Search | arXiv e-print repository

Expressivity of Neural Networks with Random Weights and Learned Biases

Authors: Ezekiel Williams, Avery Hee-Woon Ryoo, Thomas Jiralerspong, Alexandre Payeur, Matthew G. Perich, Luca Mazzucato, Guillaume Lajoie

Abstract: Landmark universal function approximation results for neural networks with trained weights and biases provided impetus for the ubiquitous use of neural networks as learning models in Artificial Intelligence (AI) and neuroscience. Recent work has pushed the bounds of universal approximation by showing that arbitrary functions can similarly be learned by tuning smaller subsets of parameters, for exa… ▽ More Landmark universal function approximation results for neural networks with trained weights and biases provided impetus for the ubiquitous use of neural networks as learning models in Artificial Intelligence (AI) and neuroscience. Recent work has pushed the bounds of universal approximation by showing that arbitrary functions can similarly be learned by tuning smaller subsets of parameters, for example the output weights, within randomly initialized networks. Motivated by the fact that biases can be interpreted as biologically plausible mechanisms for adjusting unit outputs in neural networks, such as tonic inputs or activation thresholds, we investigate the expressivity of neural networks with random weights where only biases are optimized. We provide theoretical and numerical evidence demonstrating that feedforward neural networks with fixed random weights can be trained to perform multiple tasks by learning biases only. We further show that an equivalent result holds for recurrent neural networks predicting dynamical system trajectories. Our results are relevant to neuroscience, where they demonstrate the potential for behaviourally relevant changes in dynamics without modifying synaptic weights, as well as for AI, where they shed light on multi-task methods such as bias fine-tuning and unit masking. △ Less

Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

Comments: change to article metadata only: author name typo correction

arXiv:2406.11423 [pdf, other]

Dredge Word, Social Media, and Webgraph Networks for Unreliable Website Classification and Identification

Authors: Evan M. Williams, Peter Carragher, Kathleen M. Carley

Abstract: In an attempt to mimic the complex paths through which unreliable content spreads between search engines and social media, we explore the impact of incorporating both webgraph and large-scale social media contexts into website credibility classification and discovery systems. We further explore the usage of what we define as \textit{dredge words} on social media -- terms or phrases for which unrel… ▽ More In an attempt to mimic the complex paths through which unreliable content spreads between search engines and social media, we explore the impact of incorporating both webgraph and large-scale social media contexts into website credibility classification and discovery systems. We further explore the usage of what we define as \textit{dredge words} on social media -- terms or phrases for which unreliable domains rank highly. Through comprehensive graph neural network ablations, we demonstrate that curriculum-based heterogeneous graph models that leverage context from both webgraphs and social media data outperform homogeneous and single-mode approaches. We further demonstrate that the incorporation of dredge words into our model strongly associates unreliable websites with social media and online commerce platforms. Finally, we show our heterogeneous model greatly outperforms competing systems in the top-k identification of unlabeled unreliable websites. We demonstrate the strong unreliability signals present in the diverse paths that users follow to uncover unreliable content, and we release a novel dataset of dredge words. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.14938 [pdf, other]

The Supersonic Project: Early Star Formation with the Streaming Velocity

Authors: William Lake, Claire E. Williams, Smadar Naoz, Federico Marinacci, Blakesley Burkhart, Mark Vogelsberger, Naoki Yoshida, Gen Chiaki, Avi Chen, Yeou S. Chiou

Abstract: At high redshifts ($z\gtrsim12$), the relative velocity between baryons and dark matter (the so-called streaming velocity) significantly affects star formation in low-mass objects. Streaming substantially reduces the abundance of low-mass gas objects while simultaneously allowing for the formation of supersonically-induced gas objects (SIGOs) and their associated star clusters outside of dark matt… ▽ More At high redshifts ($z\gtrsim12$), the relative velocity between baryons and dark matter (the so-called streaming velocity) significantly affects star formation in low-mass objects. Streaming substantially reduces the abundance of low-mass gas objects while simultaneously allowing for the formation of supersonically-induced gas objects (SIGOs) and their associated star clusters outside of dark matter halos. Here, we present a study of the population-level effects of streaming on star formation within both halos and SIGOs in a set of simulations with and without streaming. Notably, we find that streaming actually enhances star formation within individual halos of all masses at redshifts between $z=12$ and $z=20$. This is demonstrated both as an increased star formation rate per object as well as an enhancement of the Kennicutt-Schmidt relation for objects with streaming. We find that our simulations are consistent with some observations at high redshift, but on a population level, they continue to under-predict star formation relative to the majority of observations. However, simulations of overdense regions (both with and without streaming) agree with observations, suggesting a strategy for extracting information about the overdensity and streaming velocity in a given survey volume in future observations. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 16 pages, 8 figures

arXiv:2405.13776 [pdf]

Ballistic Energy Transport via Long Alkyl Chains: A New Initiation Mechanism

Authors: Sithara U. Nawagamuwage, Elliot S. Williams, Md Muhaiminul Islam, Igor V. Parshin, Alexander L. Burin, Nathalie Busschaert, Igor V. Rubtsov

Abstract: In an effort to increase the speed and efficiency of ballistic energy transport via oligomeric chains, we performed measurements of the transport in compounds featuring long alkyl chains of up to 37 methylene units. Compounds of the N3-(CH2)n-COOMe type (denoted as aznME) were synthesized with n = 5, 10, 15, 19, 28, 37 and studied using relaxation-assisted two-dimensional infrared spectroscopy. Th… ▽ More In an effort to increase the speed and efficiency of ballistic energy transport via oligomeric chains, we performed measurements of the transport in compounds featuring long alkyl chains of up to 37 methylene units. Compounds of the N3-(CH2)n-COOMe type (denoted as aznME) were synthesized with n = 5, 10, 15, 19, 28, 37 and studied using relaxation-assisted two-dimensional infrared spectroscopy. The speed of the ballistic transport, initiated by the N3 tag excitation, increased ca. 3-fold for the longer chains (n = 19-37) compared to the shorter chains, from 14.7 Å/ps to 48 Å/ps, in line with an earlier prediction (Nawagamuwage et al. 2021, J. Phys. Chem. B, 125, 7546). Modeling, based on solving numerically the Liouville equation, was capable of reproducing the experimental data only if three wavepackets are included, involving CH2 twisting (Tw), wagging (W), and rocking (Ro) chain bands. The approaches for designing molecular systems featuring higher speed and efficiency of energy transport are discussed. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Submitted to JPC

arXiv:2405.06634 [pdf, other]

Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark

Authors: Evan M. Williams, Kathleen M. Carley

Abstract: We evaluate the zero-shot ability of GPT-4 and LLaVa to perform simple Visual Network Analysis (VNA) tasks on small-scale graphs. We evaluate the Vision Language Models (VLMs) on 5 tasks related to three foundational network science concepts: identifying nodes of maximal degree on a rendered graph, identifying whether signed triads are balanced or unbalanced, and counting components. The tasks are… ▽ More We evaluate the zero-shot ability of GPT-4 and LLaVa to perform simple Visual Network Analysis (VNA) tasks on small-scale graphs. We evaluate the Vision Language Models (VLMs) on 5 tasks related to three foundational network science concepts: identifying nodes of maximal degree on a rendered graph, identifying whether signed triads are balanced or unbalanced, and counting components. The tasks are structured to be easy for a human who understands the underlying graph theoretic concepts, and can all be solved by counting the appropriate elements in graphs. We find that while GPT-4 consistently outperforms LLaVa, both models struggle with every visual network analysis task we propose. We publicly release the first benchmark for the evaluation of VLMs on foundational VNA tasks. △ Less

Submitted 10 June, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: 11 pages, 3 figures

arXiv:2404.08883 [pdf, other]

Projection matrices and the sweep operator

Authors: A. T. James, E. R. Williams

Abstract: These notes have been adapted from an undergraduate course given by Professor Alan James at the University of Adelaide from around 1965 and onwards. This adaption has put a focus on the definition of projection matrices and the sweep operator. These devices were at the heart of the development of the statistical package Genstat which initially focussed on the analysis of variance using the sweep o… ▽ More These notes have been adapted from an undergraduate course given by Professor Alan James at the University of Adelaide from around 1965 and onwards. This adaption has put a focus on the definition of projection matrices and the sweep operator. These devices were at the heart of the development of the statistical package Genstat which initially focussed on the analysis of variance using the sweep operator. The notes provide an algebraic background to the sweep operator which has since been used to effect in a number of experimental design settings. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 53 pages, 3 figures

arXiv:2404.08869 [pdf, other]

Misinformation Resilient Search Rankings with Webgraph-based Interventions

Authors: Peter Carragher, Evan M. Williams, Kathleen M. Carley

Abstract: The proliferation of unreliable news domains on the internet has had wide-reaching negative impacts on society. We introduce and evaluate interventions aimed at reducing traffic to unreliable news domains from search engines while maintaining traffic to reliable domains. We build these interventions on the principles of fairness (penalize sites for what is in their control), generality (label/fact… ▽ More The proliferation of unreliable news domains on the internet has had wide-reaching negative impacts on society. We introduce and evaluate interventions aimed at reducing traffic to unreliable news domains from search engines while maintaining traffic to reliable domains. We build these interventions on the principles of fairness (penalize sites for what is in their control), generality (label/fact-check agnostic), targeted (increase the cost of adversarial behavior), and scalability (works at webscale). We refine our methods on small-scale webdata as a testbed and then generalize the interventions to a large-scale webgraph containing 93.9M domains and 1.6B edges. We demonstrate that our methods penalize unreliable domains far more than reliable domains in both settings and we explore multiple avenues to mitigate unintended effects on both the small-scale and large-scale webgraph experiments. These results indicate the potential of our approach to reduce the spread of misinformation and foster a more reliable online information ecosystem. This research contributes to the development of targeted strategies to enhance the trustworthiness and quality of search engine results, ultimately benefiting users and the broader digital community. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2401.02379 [pdf, other]

doi 10.1609/icwsm.v18i1.31309

Detection and Discovery of Misinformation Sources using Attributed Webgraphs

Authors: Peter Carragher, Evan M. Williams, Kathleen M. Carley

Abstract: Website reliability labels underpin almost all research in misinformation detection. However, misinformation sources often exhibit transient behavior, which makes many such labeled lists obsolete over time. We demonstrate that Search Engine Optimization (SEO) attributes provide strong signals for predicting news site reliability. We introduce a novel attributed webgraph dataset with labeled news d… ▽ More Website reliability labels underpin almost all research in misinformation detection. However, misinformation sources often exhibit transient behavior, which makes many such labeled lists obsolete over time. We demonstrate that Search Engine Optimization (SEO) attributes provide strong signals for predicting news site reliability. We introduce a novel attributed webgraph dataset with labeled news domains and their connections to outlinking and backlinking domains. We demonstrate the success of graph neural networks in detecting news site reliability using these attributed webgraphs, and show that our baseline news site reliability classifier outperforms current SoTA methods on the PoliticalNews dataset, achieving an F1 score of 0.96. Finally, we introduce and evaluate a novel graph-based algorithm for discovering previously unknown misinformation news sources. △ Less

Submitted 26 March, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.12643 [pdf, other]

Quantifying the magnetic noise power spectrum for ensembles of P1 and NV centers in diamond

Authors: Ethan Q. Williams, Chandrasekhar Ramanathan

Abstract: We use Carr-Purcell-Meiboom-Gill (CPMG) dynamical decoupling to measure the magnetic noise power spectra for ensembles of P1 and NV centers in diamond using pulsed electron paramagnetic resonance (pEPR) at 2.5 GHz. The stroboscopically detected pEPR experiments on NV centers were performed on an HPHT (high pressure, high temperature) diamond sample at 13 mT and 190 mT, while the experiments on P1… ▽ More We use Carr-Purcell-Meiboom-Gill (CPMG) dynamical decoupling to measure the magnetic noise power spectra for ensembles of P1 and NV centers in diamond using pulsed electron paramagnetic resonance (pEPR) at 2.5 GHz. The stroboscopically detected pEPR experiments on NV centers were performed on an HPHT (high pressure, high temperature) diamond sample at 13 mT and 190 mT, while the experiments on P1 centers were performed on a CVD (chemical vapor deposition) diamond sample at 89 mT. All power spectra show two distinct features, a broad component that is observed to scale as approximately $1/ω$, and a prominent peak at the $^{13}$C Larmor precession frequency. The broad $1/ω$ behavior is consistent with an inhomogeneous distribution of Lorentzian spectra due to clustering of P1 centers, which has recently been shown to be prevalent in HPHT diamond. However, it is unknown if such clustering occurs in CVD diamond. The maximum rate at which we can apply $π$ pulses is higher than the $^{13}$C frequency at 13 mT, but is lower than the $^{13}$C frequency at 89 mT and 190 mT. We develop techniques that utilize the higher harmonics of the CPMG filter function to improve our estimate of the $^{13}$C contribution to the power spectrum at the higher fields. Surprisingly, the $^{13}$C peak, when measured with higher harmonics of the CPMG filter, appears larger than expected based on measurements with the lower harmonics. We assess the robustness of our methods in the presence of finite pulse widths and flip angle errors. These techniques could be used in a variety of ac magnetometry and noise spectroscopy measurements such as chemical sensing and nanoscale nuclear magnetic resonance. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 30 pages, 20 figures

arXiv:2311.07283 [pdf, other]

Predictive and Prescriptive Analytics for Multi-Site Modeling of Frail and Elderly Patient Services

Authors: Elizabeth Williams, Daniel Gartner, Paul Harper

Abstract: Recent research has highlighted the potential of linking predictive and prescriptive analytics. However, it remains widely unexplored how both paradigms could benefit from one another to address today's major challenges in healthcare. One of these is smarter planning of resource capacities for frail and elderly inpatient wards, addressing the societal challenge of an aging population. Frail and el… ▽ More Recent research has highlighted the potential of linking predictive and prescriptive analytics. However, it remains widely unexplored how both paradigms could benefit from one another to address today's major challenges in healthcare. One of these is smarter planning of resource capacities for frail and elderly inpatient wards, addressing the societal challenge of an aging population. Frail and elderly patients typically suffer from multimorbidity and require more care while receiving medical treatment. The aim of this research is to assess how various predictive and prescriptive analytical methods, both individually and in tandem, contribute to addressing the operational challenges within an area of healthcare that is growing in demand. Clinical and demographic patient attributes are gathered from more than 165,000 patient records and used to explain and predict length of stay. To that extent, we employ Classification and Regression Trees (CART) analysis to establish this relationship. On the prescriptive side, deterministic and two-stage stochastic programs are developed to determine how to optimally plan for beds and ward staff with the objective to minimize cost. Furthermore, the two analytical methodologies are linked by generating demand for the prescriptive models using the CART grou**s. The results show the linked methodologies provided different but similar results compared to using averages and in doing so, captured a more realistic real-world variation in the patient length of stay. Our research reveals that healthcare managers should consider using predictive and prescriptive models to make more informed decisions. By combining predictive and prescriptive analytics, healthcare managers can move away from relying on averages and incorporate the unique characteristics of their patients to create more robust planning decisions, mitigating risks caused by variations in demand. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.03799 [pdf, other]

The Supersonic Project: Lighting up the faint end of the JWST UV luminosity function

Authors: Claire E. Williams, William Lake, Smadar Naoz, Blakesley Burkhart, Tommaso Treu, Federico Marinacci, Yurina Nakazato, Mark Vogelsberger, Naoki Yoshida, Gen Chiaki, Yeou S. Chiou, Avi Chen

Abstract: The James Webb Space Telescope (JWST) is capable of probing extremely early eras of our Universe when the supersonic relative motions between dark matter and baryonic overdensities modulate structure formation ($z>\sim 10$). We study low-mass galaxy formation including this "stream velocity" using high resolution AREPO hydrodynamics simulations, and present theoretical predictions of the UV lumino… ▽ More The James Webb Space Telescope (JWST) is capable of probing extremely early eras of our Universe when the supersonic relative motions between dark matter and baryonic overdensities modulate structure formation ($z>\sim 10$). We study low-mass galaxy formation including this "stream velocity" using high resolution AREPO hydrodynamics simulations, and present theoretical predictions of the UV luminosity function (UVLF) and galaxy stellar mass function (GSMF) down to extremely faint and low mass galaxies ($M_{UV}>\sim-15$, $10^4M_\odot<=M_*<=10^8 M_\odot)$. We show that, although the stream velocity suppresses early star formation overall, it induces a short period of rapid star formation in some larger dwarfs, leading to an enhancement in the faint-end of the UVLF at $z=12$. We demonstrate that JWST observations are close to this enhanced regime, and propose that the UVLF may constitute an important probe of the stream velocity at high redshift for JWST and future observatories. △ Less

Submitted 15 December, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: 13 pages, 7 figures. Accepted ApJL

arXiv:2309.07097 [pdf, other]

Revisiting the classics: On the evolutionary origin of the "Fe II" and "He/N" spectral classes of novae

Authors: E. Aydi, L. Chomiuk, J. Strader, K. V. Sokolovsky, R. E. Williams, D. A. H. Buckley, A. Ederoclite, L. Izzo, R. Kyer, J. D. Linford, A. Kniazev, B. D. Metzger, J. Mikolajewska, P. Molaro, I. Mollina, K. Mukai, U. Munari, M. Orio, T. Panurach, B. J. Shappee, K. J. Shen, J. L. Sokoloski, R. Urquhart, F. M. Walter

Abstract: The optical spectra of novae are characterized by emission lines from the hydrogen Balmer series and either Fe II or He/N, leading to their traditional classification into two spectral classes: "Fe II" and "He/N". For decades, the origins of these spectral features were discussed in the literature in the contexts of different bodies of gas or changes in the opacity of the ejecta, particularly asso… ▽ More The optical spectra of novae are characterized by emission lines from the hydrogen Balmer series and either Fe II or He/N, leading to their traditional classification into two spectral classes: "Fe II" and "He/N". For decades, the origins of these spectral features were discussed in the literature in the contexts of different bodies of gas or changes in the opacity of the ejecta, particularly associated with studies by R. E. Williams and S. N. Shore. Here, we revisit these major studies with dedicated, modern data sets, covering the evolution of several novae from early rise to peak all the way to the nebular phase. Our data confirm previous suggestions in the literature that the "Fe II" and "He/N" spectral classes are phases in the spectroscopic evolution of novae driven primarily by changes in the opacity, ionization, and density of the ejecta, and most if not all novae go through at least three spectroscopic phases as their eruptions evolve: an early He/N (phase 1; observed during the early rise to visible peak and characterized by P Cygni lines of He I, N II, and N III), then an Fe II (phase 2; observed near visible peak and characterized by P Cygni lines of Fe II and O I), and then a later He/N (phase 3; observed during the decline and characterized by emission lines of He I. He II, N II, and N III), before entering the nebular phase. This spectral evolution seems to be ubiquitous across novae, regardless of their speed class; however the duration of each of these phase differs based on the speed class of the nova. △ Less

Submitted 27 October, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: 21 pages, 14 figures, 11 tables, Accepted in MNRAS

arXiv:2307.02725 [pdf, other]

Near-Field Wall-Modeled Large-Eddy Simulation of the NASA X-59 Low-Boom Flight Demonstrator

Authors: Emily Williams, Gonzalo Arranz, Adrián Lozano-Durán

Abstract: Wall-modeled large-eddy simulation (WMLES) is utilized to analyze the experimental aircraft X-59 Quiet SuperSonic Technology (QueSST) developed by Lockheed Martin at Skunk Works for NASA's Low-Boom Flight Demonstrator project. The simulations utilize the charLES solver and aim to assess the ability of WMLES to predict near-field noise levels under cruise conditions, considering various subgrid-sca… ▽ More Wall-modeled large-eddy simulation (WMLES) is utilized to analyze the experimental aircraft X-59 Quiet SuperSonic Technology (QueSST) developed by Lockheed Martin at Skunk Works for NASA's Low-Boom Flight Demonstrator project. The simulations utilize the charLES solver and aim to assess the ability of WMLES to predict near-field noise levels under cruise conditions, considering various subgrid-scale (SGS) models and grid resolutions. The results are compared with previous numerical studies based on the Reynolds-averaged Navier-Stokes (RANS) equations. Our findings demonstrate that WMLES produces near-field pressure predictions that are similar to those of RANS simulations at a comparable computational cost. Some mild discrepancies are observed between the WMLES and RANS predictions downstream the aircraft. These differences persist for finest grid refinement considered, suggesting that they might be attributed to underresolved interactions of shock waves and expansions waves at the trailing edge. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.01268 [pdf, other]

DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning

Authors: Edward C. Williams, Grace Su, Sandra R. Schloen, Miller C. Prosser, Susanne Paulus, Sanjay Krishnan

Abstract: Twenty-five hundred years ago, the paperwork of the Achaemenid Empire was recorded on clay tablets. In 1933, archaeologists from the University of Chicago's Oriental Institute (OI) found tens of thousands of these tablets and fragments during the excavation of Persepolis. Many of these tablets have been painstakingly photographed and annotated by expert cuneiformists, and now provide a rich datase… ▽ More Twenty-five hundred years ago, the paperwork of the Achaemenid Empire was recorded on clay tablets. In 1933, archaeologists from the University of Chicago's Oriental Institute (OI) found tens of thousands of these tablets and fragments during the excavation of Persepolis. Many of these tablets have been painstakingly photographed and annotated by expert cuneiformists, and now provide a rich dataset consisting of over 5,000 annotated tablet images and 100,000 cuneiform sign bounding boxes. We leverage this dataset to develop DeepScribe, a modular computer vision pipeline capable of localizing cuneiform signs and providing suggestions for the identity of each sign. We investigate the difficulty of learning subtasks relevant to cuneiform tablet transcription on ground-truth data, finding that a RetinaNet object detector can achieve a localization mAP of 0.78 and a ResNet classifier can achieve a top-5 sign classification accuracy of 0.89. The end-to-end pipeline achieves a top-5 classification accuracy of 0.80. As part of the classification module, DeepScribe groups cuneiform signs into morphological clusters. We consider how this automatic clustering approach differs from the organization of standard, printed sign lists and what we may learn from it. These components, trained individually, are sufficient to produce a system that can analyze photos of cuneiform tablets from the Achaemenid period and provide useful transliteration suggestions to researchers. We evaluate the model's end-to-end performance on locating and classifying signs, providing a roadmap to a linguistically-aware transliteration system, then consider the model's potential utility when applied to other periods of cuneiform writing. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: Currently under review in the ACM JOCCH

arXiv:2306.01047 [pdf, other]

The Supersonic Project: Star Formation in Early Star Clusters without Dark Matter

Authors: William Lake, Smadar Naoz, Federico Marinacci, Blakesley Burkhart, Mark Vogelsberger, Claire E. Williams, Yeou S. Chiou, Gen Chiaki, Yurina Nakazato, Naoki Yoshida

Abstract: The formation mechanism of globular clusters (GCs) has long been debated by astronomers. It was recently proposed that Supersonically Induced Gas Objects (SIGOs), which formed in the early Universe due to the supersonic relative motion of baryons and dark matter at recombination, could be the progenitors of early globular clusters. In order to become GCs, SIGOs must form stars relatively efficient… ▽ More The formation mechanism of globular clusters (GCs) has long been debated by astronomers. It was recently proposed that Supersonically Induced Gas Objects (SIGOs), which formed in the early Universe due to the supersonic relative motion of baryons and dark matter at recombination, could be the progenitors of early globular clusters. In order to become GCs, SIGOs must form stars relatively efficiently despite forming outside of dark matter halos. We investigate the potential for star formation in SIGOs using cosmological hydrodynamic simulations, including the aforementioned relative motions of baryons and dark matter, molecular hydrogen cooling in primordial gas clouds, and including explicit star formation. We find that SIGOs do form stars and that the nascent star clusters formed through this process are accreted by dark matter halos on short timescales (a few hundreds of Myr). Thus, SIGOs may be found as intact substructures within these halos, analogous to many present-day GCs. From this result, we conclude that SIGOs are capable of forming star clusters with similar properties to globular clusters in the early Universe and we discuss their detectablity by upcoming JWST surveys. △ Less

Submitted 18 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 12 pages, 5 figures

arXiv:2304.08393 [pdf, other]

Search for gravitational-lensing signatures in the full third observing run of the LIGO-Virgo network

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1670 additional authors not shown)

Abstract: Gravitational lensing by massive objects along the line of sight to the source causes distortions of gravitational wave-signals; such distortions may reveal information about fundamental physics, cosmology and astrophysics. In this work, we have extended the search for lensing signatures to all binary black hole events from the third observing run of the LIGO--Virgo network. We search for repeated… ▽ More Gravitational lensing by massive objects along the line of sight to the source causes distortions of gravitational wave-signals; such distortions may reveal information about fundamental physics, cosmology and astrophysics. In this work, we have extended the search for lensing signatures to all binary black hole events from the third observing run of the LIGO--Virgo network. We search for repeated signals from strong lensing by 1) performing targeted searches for subthreshold signals, 2) calculating the degree of overlap amongst the intrinsic parameters and sky location of pairs of signals, 3) comparing the similarities of the spectrograms amongst pairs of signals, and 4) performing dual-signal Bayesian analysis that takes into account selection effects and astrophysical knowledge. We also search for distortions to the gravitational waveform caused by 1) frequency-independent phase shifts in strongly lensed images, and 2) frequency-dependent modulation of the amplitude and phase due to point masses. None of these searches yields significant evidence for lensing. Finally, we use the non-detection of gravitational-wave lensing to constrain the lensing rate based on the latest merger-rate estimates and the fraction of dark matter composed of compact objects. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 28 pages, 11 figures

Report number: LIGO-P2200031

arXiv:2304.04306 [pdf, other]

doi 10.1093/mnras/stad1914

Catching a nova X-ray/UV flash in the visible? Early spectroscopy of the extremely slow Nova Velorum 2022 (Gaia22alz)

Authors: E. Aydi, L. Chomiuk, J. Mikołajewska, J. Brink, B. D. Metzger, J. Strader, D. A. H. Buckley, E. J. Harvey, T. W. -S. Holoien, L. Izzo, A. Kawash, J. D. Linford, P. Molaro, B. Mollina, P. Mróz, K. Mukai, M. Orio, T. Panurach, P. Senchyna, B. J. Shappee, K. J. Shen, J. L. Sokoloski, K. V. Sokolovsky, R. Urquhart, R. E. Williams

Abstract: We present early spectral observations of the very slow Galactic nova Gaia22alz, over its gradual rise to peak brightness that lasted 180 days. During the first 50 days, when the nova was only 3--4 magnitudes above its normal brightness, the spectra showed narrow (FWHM $\approx$ 400 km s$^{-1}$) emission lines of H Balmer, He I, He II, and C IV, but no P Cygni absorption. A few weeks later, the hi… ▽ More We present early spectral observations of the very slow Galactic nova Gaia22alz, over its gradual rise to peak brightness that lasted 180 days. During the first 50 days, when the nova was only 3--4 magnitudes above its normal brightness, the spectra showed narrow (FWHM $\approx$ 400 km s$^{-1}$) emission lines of H Balmer, He I, He II, and C IV, but no P Cygni absorption. A few weeks later, the high-excitation He II and C IV lines disappeared, and P Cygni profiles of Balmer, He I, and eventually Fe II lines emerged, yielding a spectrum typical of classical novae before peak. We propose that the early spectra of Gaia22alz are produced in the white dwarf's envelope or accretion disk, reprocessing X-ray and ultraviolet emission from the white dwarf after a dramatic increase in the rate of thermonuclear reactions, during a phase known as the ``early X-ray/UV flash''. If true, this would be one of the rare times that the optical signature of the early X-ray/UV flash has been detected. While this phase might last only a few hours in other novae and thus be easily missed, it was possible to detect in Gaia22alz due to its very slow and gradual rise and thanks to the efficiency of new all-sky surveys in detecting transients on their rise. We also consider alternative scenarios that could explain the early spectral features of Gaia22alz and its unusually slow rise. △ Less

Submitted 9 April, 2023; originally announced April 2023.

Comments: 20 pages, 12 figures, 2 tables. Submitted to MNRAS

arXiv:2304.02577 [pdf, other]

ECG Feature Importance Rankings: Cardiologists vs. Algorithms

Authors: Temesgen Mehari, Ashish Sundar, Alen Bosnjakovic, Peter Harris, Steven E. Williams, Axel Loewe, Olaf Doessel, Claudia Nagel, Nils Strodthoff, Philip J. Aston

Abstract: Feature importance methods promise to provide a ranking of features according to importance for a given classification task. A wide range of methods exist but their rankings often disagree and they are inherently difficult to evaluate due to a lack of ground truth beyond synthetic datasets. In this work, we put feature importance methods to the test on real-world data in the domain of cardiology,… ▽ More Feature importance methods promise to provide a ranking of features according to importance for a given classification task. A wide range of methods exist but their rankings often disagree and they are inherently difficult to evaluate due to a lack of ground truth beyond synthetic datasets. In this work, we put feature importance methods to the test on real-world data in the domain of cardiology, where we try to distinguish three specific pathologies from healthy subjects based on ECG features comparing to features used in cardiologists' decision rules as ground truth. Some methods generally performed well and others performed poorly, while some methods did well on some but not all of the problems considered. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2303.10721 [pdf, other]

Right the docs: Characterising voice dataset documentation practices used in machine learning

Authors: Kathy Reid, Elizabeth T. Williams

Abstract: Voice-enabled technology is quickly becoming ubiquitous, and is constituted from machine learning (ML)-enabled components such as speech recognition and voice activity detection. However, these systems don't yet work well for everyone. They exhibit bias - the systematic and unfair discrimination against individuals or cohorts of individuals in favour of others (Friedman & Nissembaum, 1996) - acros… ▽ More Voice-enabled technology is quickly becoming ubiquitous, and is constituted from machine learning (ML)-enabled components such as speech recognition and voice activity detection. However, these systems don't yet work well for everyone. They exhibit bias - the systematic and unfair discrimination against individuals or cohorts of individuals in favour of others (Friedman & Nissembaum, 1996) - across axes such as age, gender and accent. ML is reliant on large datasets for training. Dataset documentation is designed to give ML Practitioners (MLPs) a better understanding of a dataset's characteristics. However, there is a lack of empirical research on voice dataset documentation specifically. Additionally, while MLPs are frequent participants in fairness research, little work focuses on those who work with voice data. Our work makes an empirical contribution to this gap. Here, we combine two methods to form an exploratory study. First, we undertake 13 semi-structured interviews, exploring multiple perspectives of voice dataset documentation practice. Using open and axial coding methods, we explore MLPs' practices through the lenses of roles and tradeoffs. Drawing from this work, we then purposively sample voice dataset documents (VDDs) for 9 voice datasets. Our findings then triangulate these two methods, using the lenses of MLP roles and trade-offs. We find that current VDD practices are inchoate, inadequate and incommensurate. The characteristics of voice datasets are codified in fragmented, disjoint ways that often do not meet the needs of MLPs. Moreover, they cannot be readily compared, presenting a barrier to practitioners' bias reduction efforts. We then discuss the implications of these findings for bias practices in voice data and speech technologies. We conclude by setting out a program of future work to address these findings -- that is, how we may "right the docs". △ Less

Submitted 19 March, 2023; originally announced March 2023.

Comments: 16 pages, 3 tables, preprint of a submission to AIES 2023

ACM Class: K.4

arXiv:2302.12431 [pdf, other]

Flexible Phase Dynamics for Bio-Plausible Contrastive Learning

Authors: Ezekiel Williams, Colin Bredenberg, Guillaume Lajoie

Abstract: Many learning algorithms used as normative models in neuroscience or as candidate approaches for learning on neuromorphic chips learn by contrasting one set of network states with another. These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics that could limit the range of physical systems capable of harnessing CL.… ▽ More Many learning algorithms used as normative models in neuroscience or as candidate approaches for learning on neuromorphic chips learn by contrasting one set of network states with another. These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics that could limit the range of physical systems capable of harnessing CL. In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed. Thanks to a set of general theorems corroborated by numerical experiments across several CL models, our results provide theoretical foundations for the study and development of CL methods for biological and neuromorphic neural networks. △ Less

Submitted 30 August, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: 23 pages, 4 figures. Paper accepted to ICML and update includes changes made based on reviewer feedback

Journal ref: PMLR 202:37042-37065, 2023

arXiv:2302.02972 [pdf, other]

Concrete Safety for ML Problems: System Safety for ML Development and Assessment

Authors: Edgar W. Jatho, Logan O. Mailloux, Eugene D. Williams, Patrick McClure, Joshua A. Kroll

Abstract: Many stakeholders struggle to make reliances on ML-driven systems due to the risk of harm these systems may cause. Concerns of trustworthiness, unintended social harms, and unacceptable social and ethical violations undermine the promise of ML advancements. Moreover, such risks in complex ML-driven systems present a special challenge as they are often difficult to foresee, arising over periods of… ▽ More Many stakeholders struggle to make reliances on ML-driven systems due to the risk of harm these systems may cause. Concerns of trustworthiness, unintended social harms, and unacceptable social and ethical violations undermine the promise of ML advancements. Moreover, such risks in complex ML-driven systems present a special challenge as they are often difficult to foresee, arising over periods of time, across populations, and at scale. These risks often arise not from poor ML development decisions or low performance directly but rather emerge through the interactions amongst ML development choices, the context of model use, environmental factors, and the effects of a model on its target. Systems safety engineering is an established discipline with a proven track record of identifying and managing risks even in high-complexity sociotechnical systems. In this work, we apply a state-of-the-art systems safety approach to concrete applications of ML with notable social and ethical risks to demonstrate a systematic means for meeting the assurance requirements needed to argue for safe and trustworthy ML in sociotechnical systems. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2211.04602

arXiv:2301.06998 [pdf, other]

Evaluation of an Open-Source Pipeline to Create Patient-Specific Left Atrial Models: A Reproducibility Study

Authors: Jose Alonso Solis-Lemus, Tiffany Baptiste, Rosie Barrows, Charles Sillett, Ali Gharaviri, Giulia Raffaele, Orod Razeghi, Marina Strocchi, Iain Sim, Irum Kotadia, Neil Bodagh, Daniel O'Hare, Mark O'Neill, Steven E Williams, Caroline Roney, Steven Niederer

Abstract: We present an open-source software pipeline to create patient-specific left atrial (LA) models with fibre orientations and a fibrosis map, suitable for electrophysiology simulations. The semi-automatic pipeline takes as input a contrast enhanced magnetic resonance angiogram, and a late gadolinium enhanced (LGE) contrast magnetic resonance (CMR). Five operators were allocated 20 cases each from a s… ▽ More We present an open-source software pipeline to create patient-specific left atrial (LA) models with fibre orientations and a fibrosis map, suitable for electrophysiology simulations. The semi-automatic pipeline takes as input a contrast enhanced magnetic resonance angiogram, and a late gadolinium enhanced (LGE) contrast magnetic resonance (CMR). Five operators were allocated 20 cases each from a set of 50 CMR datasets to create a total of 100 models to evaluate inter/intra-operator variability. Each output model consisted of (1) a labelled surface mesh open at the pulmonary veins (PV) and mitral valve (MV), (2) fibre orientations mapped from a diffusion tensor MRI human atlas, (3) fibrosis map from the LGE-CMR scan, and (4) simulation of local activation time (LAT) and phase singularity (PS) map**. We evaluated reproducibility in our pipeline by comparing agreement in shape of the output meshes, fibrosis distribution in the LA body, and fibre orientations; simulations outputs were evaluated comparing total activation times of LAT maps, mean conduction velocity (CV), and structural similarity index measure (SSIM) of PS maps. Our workflow allows a single model to be created in 16.72 +/- 12.25 minutes. Results in this abstract are reported as inter/intra. Shape only differed noticeably with users' selection of the MV and the length of the PV from the ostia to the distal end; fibrosis agreement (0.91/0.99 ICC) and fibre orientation agreement (60.63/71.77 %) were high. LAT maps showed good agreement, the median of the absolute difference of the total activation times was 2.02ms/1.37ms. The average of the mean CV difference was -4.04mm/s / 2.1mm/s. PS maps showed a moderately good agreement with SSIM of 0.648/0.608. Although we found notable differences in the models due to user input, our tests show that operator variability was comparable to that of image resolution or fibre estimation. △ Less

Submitted 9 May, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: 17 pages, 7 figures, submitted for review at Journal of Computers in Biology and Medicine (in press)

arXiv:2212.14124 [pdf]

Joint Action is a Framework for Understanding Partnerships Between Humans and Upper Limb Prostheses

Authors: Michael R. Dawson, Adam S. R. Parker, Heather E. Williams, Ahmed W. Shehata, Jacqueline S. Hebert, Craig S. Chapman, Patrick M. Pilarski

Abstract: Recent advances in upper limb prostheses have led to significant improvements in the number of movements provided by the robotic limb. However, the method for controlling multiple degrees of freedom via user-generated signals remains challenging. To address this issue, various machine learning controllers have been developed to better predict movement intent. As these controllers become more intel… ▽ More Recent advances in upper limb prostheses have led to significant improvements in the number of movements provided by the robotic limb. However, the method for controlling multiple degrees of freedom via user-generated signals remains challenging. To address this issue, various machine learning controllers have been developed to better predict movement intent. As these controllers become more intelligent and take on more autonomy in the system, the traditional approach of representing the human-machine interface as a human controlling a tool becomes limiting. One possible approach to improve the understanding of these interfaces is to model them as collaborative, multi-agent systems through the lens of joint action. The field of joint action has been commonly applied to two human partners who are trying to work jointly together to achieve a task, such as singing or moving a table together, by effecting coordinated change in their shared environment. In this work, we compare different prosthesis controllers (proportional electromyography with sequential switching, pattern recognition, and adaptive switching) in terms of how they present the hallmarks of joint action. The results of the comparison lead to a new perspective for understanding how existing myoelectric systems relate to each other, along with recommendations for how to improve these systems by increasing the collaborative communication between each partner. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Comments: Submitted to Frontiers in Neurorobotics

arXiv:2212.05120 [pdf, other]

Wall-modeled large-eddy simulation based on building-block flows

Authors: Yuenong Ling, Gonzalo Arranz, Emily Williams, Konrad Goc, Kevin Griffin, Adrián Lozano-Durán

Abstract: A unified subgrid-scale (SGS) and wall model for large-eddy simulation (LES) is proposed by devising the flow as a collection of building blocks that enables the prediction of the eddy viscosity. The core assumption of the model is that simple canonical flows contain the essential physics to provide accurate predictions of the SGS tensor in more complex flows. The model is constructed to predict z… ▽ More A unified subgrid-scale (SGS) and wall model for large-eddy simulation (LES) is proposed by devising the flow as a collection of building blocks that enables the prediction of the eddy viscosity. The core assumption of the model is that simple canonical flows contain the essential physics to provide accurate predictions of the SGS tensor in more complex flows. The model is constructed to predict zero-pressure-gradient wall-bounded turbulence, adverse pressure gradient effects, separation and laminar flow. The approach is implemented using a Bayesian classifier, which identifies the contribution of each building block in the flow, and a neural-network-based predictor, which estimates the eddy viscosity based on the building-block units. The training data are directly obtained from wall-modeled LES with an exact SGS/wall model for the mean quantities to guarantee consistency with the numerical discretization. The model is validated in canonical flows and the NASA High-Lift Common Research Model and shown to improve the predictions with respect to current modeling approaches. △ Less

Submitted 17 December, 2022; v1 submitted 9 December, 2022; originally announced December 2022.

arXiv:2212.01477 [pdf, other]

doi 10.1093/mnras/stad3120

Search for subsolar-mass black hole binaries in the second part of Advanced LIGO's and Advanced Virgo's third observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1680 additional authors not shown)

Abstract: We describe a search for gravitational waves from compact binaries with at least one component with mass 0.2 $M_\odot$ -- $1.0 M_\odot$ and mass ratio $q \geq 0.1$ in Advanced LIGO and Advanced Virgo data collected between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC. No signals were detected. The most significant candidate has a false alarm rate of 0.2 $\mathrm{yr}^{-1}$. We estimate t… ▽ More We describe a search for gravitational waves from compact binaries with at least one component with mass 0.2 $M_\odot$ -- $1.0 M_\odot$ and mass ratio $q \geq 0.1$ in Advanced LIGO and Advanced Virgo data collected between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC. No signals were detected. The most significant candidate has a false alarm rate of 0.2 $\mathrm{yr}^{-1}$. We estimate the sensitivity of our search over the entirety of Advanced LIGO's and Advanced Virgo's third observing run, and present the most stringent limits to date on the merger rate of binary black holes with at least one subsolar-mass component. We use the upper limits to constrain two fiducial scenarios that could produce subsolar-mass black holes: primordial black holes (PBH) and a model of dissipative dark matter. The PBH model uses recent prescriptions for the merger rate of PBH binaries that include a rate suppression factor to effectively account for PBH early binary disruptions. If the PBHs are monochromatically distributed, we can exclude a dark matter fraction in PBHs $f_\mathrm{PBH} \gtrsim 0.6$ (at 90% confidence) in the probed subsolar-mass range. However, if we allow for broad PBH mass distributions we are unable to rule out $f_\mathrm{PBH} = 1$. For the dissipative model, where the dark matter has chemistry that allows a small fraction to cool and collapse into black holes, we find an upper bound $f_{\mathrm{DBH}} < 10^{-5}$ on the fraction of atomic dark matter collapsed into black holes. △ Less

Submitted 26 January, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: https://dcc.ligo.org/P2200139

arXiv:2211.16120 [pdf]

doi 10.1109/TMAG.2022.3151147

Detecting Magnetic Ink Barcodes with Handheld Magnetoresistive Sensors

Authors: Sofia Abrunhosa, Ian Gibb, Rita Macedo, Emrys Williams, Nathalie Muller, Paulo P. Freitas, Susana Cardoso

Abstract: Information encoding in barcodes using magnetic-based technology is a unique strategy to read data buried underneath non-transparent surfaces since a direct line-of-sight between the code and the reader is not required. This technology is of particular interest in secure labelling and recyclable packaging applications. However, current magnetic reading heads, such as those employed for magnetic in… ▽ More Information encoding in barcodes using magnetic-based technology is a unique strategy to read data buried underneath non-transparent surfaces since a direct line-of-sight between the code and the reader is not required. This technology is of particular interest in secure labelling and recyclable packaging applications. However, current magnetic reading heads, such as those employed for magnetic ink character recognition, need to be placed in contact with the magnetic structures, limiting the depths at which the information can be read. This paper describes a strategy to overcome that limitation by replacing the traditional inductive heads with tunnel magnetoresistive (TMR) sensors. Soft-magnetic codes can be printed using conventional LaserJet toners and, by having their magnetisation set with a permanent magnet included in the device, the resulting magnetic field can be read using a TMR sensor. We demonstrate that such a device can read barcodes at depths of at least 1 mm. It can also resolve individual structures as thin as 200 μm when used in contact. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: 4 pages, 5 figures, Manuscript accepted for publication in IEEE Transactions on Magnetics

Journal ref: IEEE Transactions on Magnetics 58, 4002304 (2022)

arXiv:2211.15997 [pdf, other]

MedalCare-XL: 16,900 healthy and pathological 12 lead ECGs obtained through electrophysiological simulations

Authors: Karli Gillette, Matthias A. F. Gsell, Claudia Nagel, Jule Bender, Bejamin Winkler, Steven E. Williams, Markus Bär, Tobias Schäffter, Olaf Dössel, Gernot Plank, Axel Loewe

Abstract: Mechanistic cardiac electrophysiology models allow for personalized simulations of the electrical activity in the heart and the ensuing electrocardiogram (ECG) on the body surface. As such, synthetic signals possess known ground truth labels of the underlying disease and can be employed for validation of machine learning ECG analysis tools in addition to clinical signals. Recently, synthetic ECGs… ▽ More Mechanistic cardiac electrophysiology models allow for personalized simulations of the electrical activity in the heart and the ensuing electrocardiogram (ECG) on the body surface. As such, synthetic signals possess known ground truth labels of the underlying disease and can be employed for validation of machine learning ECG analysis tools in addition to clinical signals. Recently, synthetic ECGs were used to enrich sparse clinical data or even replace them completely during training leading to improved performance on real-world clinical test data. We thus generated a novel synthetic database comprising a total of 16,900 12 lead ECGs based on electrophysiological simulations equally distributed into healthy control and 7 pathology classes. The pathological case of myocardial infraction had 6 sub-classes. A comparison of extracted features between the virtual cohort and a publicly available clinical ECG database demonstrated that the synthetic signals represent clinical ECGs for healthy and pathological subpopulations with high fidelity. The ECG database is split into training, validation, and test folds for development and objective assessment of novel machine learning algorithms. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.15784 [pdf, other]

A Survey of Relevant Text Mining Technology

Authors: Claudia Peersman, Matthew Edwards, Emma Williams, Awais Rashid

Abstract: Recent advances in text mining and natural language processing technology have enabled researchers to detect an authors identity or demographic characteristics, such as age and gender, in several text genres by automatically analysing the variation of linguistic characteristics. However, applying such techniques in the wild, i.e., in both cybercriminal and regular online social media, differs from… ▽ More Recent advances in text mining and natural language processing technology have enabled researchers to detect an authors identity or demographic characteristics, such as age and gender, in several text genres by automatically analysing the variation of linguistic characteristics. However, applying such techniques in the wild, i.e., in both cybercriminal and regular online social media, differs from more general applications in that its defining characteristics are both domain and process dependent. This gives rise to a number of challenges of which contemporary research has only scratched the surface. More specifically, a text mining approach applied on social media communications typically has no control over the dataset size, the number of available communications will vary across users. Hence, the system has to be robust towards limited data availability. Additionally, the quality of the data cannot be guaranteed. As a result, the approach needs to be tolerant to a certain degree of linguistic noise (for example, abbreviations, non-standard language use, spelling variations and errors). Finally, in the context of cybercriminal fora, it has to be robust towards deceptive or adversarial behaviour, i.e. offenders who attempt to hide their criminal intentions (obfuscation) or who assume a false digital persona (imitation), potentially using coded language. In this work we present a comprehensive survey that discusses the problems that have already been addressed in current literature and review potential solutions. Additionally, we highlight which areas need to be given more attention. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.06989 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095729

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Authors: Jacob J Webber, Cassia Valentini-Botinhao, Evelyn Williams, Gustav Eje Henter, Simon King

Abstract: Most state-of-the-art Text-to-Speech systems use the mel-spectrogram as an intermediate representation, to decompose the task into acoustic modelling and waveform generation. A mel-spectrogram is extracted from the waveform by a simple, fast DSP operation, but generating a high-quality waveform from a mel-spectrogram requires computationally expensive machine learning: a neural vocoder. Our prop… ▽ More Most state-of-the-art Text-to-Speech systems use the mel-spectrogram as an intermediate representation, to decompose the task into acoustic modelling and waveform generation. A mel-spectrogram is extracted from the waveform by a simple, fast DSP operation, but generating a high-quality waveform from a mel-spectrogram requires computationally expensive machine learning: a neural vocoder. Our proposed ``autovocoder'' reverses this arrangement. We use machine learning to obtain a representation that replaces the mel-spectrogram, and that can be inverted back to a waveform using simple, fast operations including a differentiable implementation of the inverse STFT. The autovocoder generates a waveform 5 times faster than the DSP-based Griffin-Lim algorithm, and 14 times faster than the neural vocoder HiFi-GAN. We provide perceptual listening test results to confirm that the speech is of comparable quality to HiFi-GAN in the copy synthesis task. △ Less

Submitted 24 May, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: Accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5

arXiv:2211.04602 [pdf, other]

System Safety Engineering for Social and Ethical ML Risks: A Case Study

Authors: Edgar W. Jatho III, Logan O. Mailloux, Shalaleh Rismani, Eugene D. Williams, Joshua A. Kroll

Abstract: Governments, industry, and academia have undertaken efforts to identify and mitigate harms in ML-driven systems, with a particular focus on social and ethical risks of ML components in complex sociotechnical systems. However, existing approaches are largely disjointed, ad-hoc and of unknown effectiveness. Systems safety engineering is a well established discipline with a track record of identifyin… ▽ More Governments, industry, and academia have undertaken efforts to identify and mitigate harms in ML-driven systems, with a particular focus on social and ethical risks of ML components in complex sociotechnical systems. However, existing approaches are largely disjointed, ad-hoc and of unknown effectiveness. Systems safety engineering is a well established discipline with a track record of identifying and managing risks in many complex sociotechnical domains. We adopt the natural hypothesis that tools from this domain could serve to enhance risk analyses of ML in its context of use. To test this hypothesis, we apply a "best of breed" systems safety analysis, Systems Theoretic Process Analysis (STPA), to a specific high-consequence system with an important ML-driven component, namely the Prescription Drug Monitoring Programs (PDMPs) operated by many US States, several of which rely on an ML-derived risk score. We focus in particular on how this analysis can extend to identifying social and ethical risks and develo** concrete design-level controls to mitigate them. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: 14 pages, 5 figures, 3 tables. Accepted to 36th Conference on Neural Information Processing Systems, Workshop on ML Safety (NeurIPS 2022)

arXiv:2211.02066 [pdf, other]

doi 10.3847/1538-4357/acb820

The Supersonic Project: The eccentricity and rotational support of SIGOs and DM GHOSts

Authors: Claire E. Williams, Smadar Naoz, William Lake, Yeou S. Chiou, Blakesley Burkhart, Federico Marinacci, Mark Vogelsberger, Gen Chiaki, Yurina Nakazato, Naoki Yoshida

Abstract: A supersonic relative velocity between dark matter (DM) and baryons (the stream velocity) at the time of recombination induces the formation of low mass objects with anomalous properties in the early Universe. We widen the scope of the `Supersonic Project' paper series to include objects we term Dark Matter + Gas Halos Offset by Streaming (DM GHOSts)--diffuse, DM-enriched structures formed because… ▽ More A supersonic relative velocity between dark matter (DM) and baryons (the stream velocity) at the time of recombination induces the formation of low mass objects with anomalous properties in the early Universe. We widen the scope of the `Supersonic Project' paper series to include objects we term Dark Matter + Gas Halos Offset by Streaming (DM GHOSts)--diffuse, DM-enriched structures formed because of a physical offset between the centers of mass of DM and baryonic overdensities. We present an updated numerical investigation of DM GHOSts and Supersonically Induced Gas Objects (SIGOs), including the effects of molecular cooling, in high resolution hydrodynamic simulations using the AREPO code. Supplemented by an analytical understanding of their ellipsoidal gravitational potentials, we study the population-level properties of these objects, characterizing their morphology, spin, radial mass, and velocity distributions in comparison to classical structures in non-streaming regions. The stream velocity causes deviations from sphericity in both the gas and DM components and lends greater rotational support to the gas. Low mass ($<\sim 10^{5.5}$ M$_\odot$) objects in regions of streaming demonstrate core-like rotation and mass profiles. Anomalies in the rotation and morphology of DM GHOSts could represent an early Universe analogue to observed ultra-faint dwarf galaxies with variations in DM content and unusual rotation curves. △ Less

Submitted 13 February, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: 27 pages, 20 figures. Accepted ApJ

arXiv:2209.02863 [pdf]

doi 10.3847/2041-8213/aca1b0

Model-based cross-correlation search for gravitational waves from the low-mass X-ray binary Scorpius X-1 in LIGO O3 data

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1670 additional authors not shown)

Abstract: We present the results of a model-based search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1 using LIGO detector data from the third observing run of Advanced LIGO, Advanced Virgo and KAGRA. This is a semicoherent search which uses details of the signal model to coherently combine data separated by less than a specified coherence time, which can be adjusted to bala… ▽ More We present the results of a model-based search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1 using LIGO detector data from the third observing run of Advanced LIGO, Advanced Virgo and KAGRA. This is a semicoherent search which uses details of the signal model to coherently combine data separated by less than a specified coherence time, which can be adjusted to balance sensitivity with computing cost. The search covered a range of gravitational-wave frequencies from 25Hz to 1600Hz, as well as ranges in orbital speed, frequency and phase determined from observational constraints. No significant detection candidates were found, and upper limits were set as a function of frequency. The most stringent limits, between 100Hz and 200Hz, correspond to an amplitude h0 of about 1e-25 when marginalized isotropically over the unknown inclination angle of the neutron star's rotation axis, or less than 4e-26 assuming the optimal orientation. The sensitivity of this search is now probing amplitudes predicted by models of torque balance equilibrium. For the usual conservative model assuming accretion at the surface of the neutron star, our isotropically-marginalized upper limits are close to the predicted amplitude from about 70Hz to 100Hz; the limits assuming the neutron star spin is aligned with the most likely orbital angular momentum are below the conservative torque balance predictions from 40Hz to 200Hz. Assuming a broader range of accretion models, our direct limits on gravitational-wave amplitude delve into the relevant parameter space over a wide range of frequencies, to 500Hz or more. △ Less

Submitted 2 January, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: 19 pages, Open Access Journal PDF

Report number: LIGO-P2100110-v13

Journal ref: The Astrophysical Journal Letters, 941, L30 (2022)

arXiv:2208.09630 [pdf, other]

doi 10.1051/0004-6361/202244498

Persistent nuclear burning in Nova Sgr 2016 N.4 (= V5856 Sgr = ASASSN-16ma) six years past its outburst

Authors: U. Munari, N. Masetti, F. M. Walter, R. E. Williams, F. -J. Hambsch, A. Frigo, P. Valisa

Abstract: We report on the fast Nova Sgr 2016 N.4 being surprisingly trapped in a long-lasting and bright plateau (Delta I >= 10 mag above quiescence) six years past the nova eruption. Very few other novae experience a similar occurrence. We carried out an intensive observing campaign collecting daily BVRI photometry and monthly high-resolution optical spectroscopy, and observed the nova in ultraviolet and… ▽ More We report on the fast Nova Sgr 2016 N.4 being surprisingly trapped in a long-lasting and bright plateau (Delta I >= 10 mag above quiescence) six years past the nova eruption. Very few other novae experience a similar occurrence. We carried out an intensive observing campaign collecting daily BVRI photometry and monthly high-resolution optical spectroscopy, and observed the nova in ultraviolet and X-rays with Swift satellite at five distinct epochs. The bolometric luminosity radiated during the plateau is ~4200 Lsun (scaled to the distance of the Galactic Bulge), corresponding to stable nuclear burning on a 0.6 Msun white dwarf. A stable wind is blown off at FWZI~1600 km/s, with episodic reinforcement of a faster FWZI~3400 km/s mass loss, probably oriented along the polar directions. The collision of these winds could power the emission detected in X-rays. The burning shell has an outer radius of ~25 Rsun at which the effective temperature is ~7600 K, values similar to those of a F0 II/Ib bright giant. The Delta m < 1 mag variability displayed during the plateau is best described as chaotic, with the irregular appearance of quasi-periodic oscillations with a periodicity of 15-17 days. A limited amount of dust (~3x10^(-11) Msun) continuously condenses at T(dust)~1200 K in the outflowing wind, radiating L(dust)~52 Lsun. △ Less

Submitted 20 August, 2022; originally announced August 2022.

Comments: Accepted for publication in A&A

Journal ref: A&A 667, A7 (2022)

arXiv:2208.05987 [pdf, other]

doi 10.3847/1538-4357/acac8d

The Supersonic Project: The Early Evolutionary Path of SIGOs

Authors: William Lake, Smadar Naoz, Blakesley Burkhart, Federico Marinacci, Mark Vogelsberger, Gen Chiaki, Yeou S. Chiou, Naoki Yoshida, Yurina Nakazato, Claire E. Williams

Abstract: Supersonically Induced Gas Objects (SIGOs) are a class of early Universe objects that have gained attention as a potential formation route for globular clusters. SIGOs have only recently begun to be studied in the context of molecular hydrogen cooling, which is key to characterizing their structure and evolution. Studying the population-level properties of SIGOs with molecular cooling is important… ▽ More Supersonically Induced Gas Objects (SIGOs) are a class of early Universe objects that have gained attention as a potential formation route for globular clusters. SIGOs have only recently begun to be studied in the context of molecular hydrogen cooling, which is key to characterizing their structure and evolution. Studying the population-level properties of SIGOs with molecular cooling is important for understanding their potential for collapse and star formation, and central for addressing whether SIGOs can survive to the present epoch. Here, we investigate the evolution of SIGOs before they form stars, using a combination of numerical and analytical analysis. For example, we study various timescales important to the evolution of SIGOs at a population level in the presence of molecular cooling. Revising the previous formulation for the critical density of collapse for SIGOs allows us to show that their prolateness tends to act as an inhibiting factor to collapse. We find that simulated SIGOs are limited by artificial two-body relaxation effects that tend to disperse them, an effect of their limited resolution. We expect that SIGOs in nature will be longer-lived compared to our simulations. Further, the fall-back timescale on which SIGOs fall into nearby dark matter halos, potentially producing a globular-cluster-like system, is frequently longer than their cooling timescale and the collapse timescale on which they shrink through gravity. Therefore, some SIGOs have time to cool and collapse outside of halos despite initially failing to exceed the critical density, even without considering metal line cooling. From this analysis we conclude that SIGOs should form stars outside of halos in non-negligible stream velocity patches in the Universe. △ Less

Submitted 9 January, 2023; v1 submitted 11 August, 2022; originally announced August 2022.

Comments: 18 pages, 11 figures

arXiv:2207.11961 [pdf, other]

doi 10.1021/acs.jpcc.2c06145

Large Room Temperature Bulk DNP of $^{13}$C via P1 Centers in Diamond

Authors: Daphna Shimon, Kelly A. Cantwell, Linta Joseph, Ethan Q. Williams, Zaili Peng, Susumu Takahashi, Chandrasekhar Ramanathan

Abstract: We use microwave-induced dynamic nuclear polarization (DNP) of the substitutional nitrogen defects (P1 centers) in diamond to hyperpolarize bulk $^{13}$C nuclei in both single crystal and powder samples at room temperature at 3.34 T. The large ($>100$-fold) enhancements demonstrated correspond to a greater than 10,000 fold improvement in terms of signal averaging of the 1\% abundant $^{13}$C spins… ▽ More We use microwave-induced dynamic nuclear polarization (DNP) of the substitutional nitrogen defects (P1 centers) in diamond to hyperpolarize bulk $^{13}$C nuclei in both single crystal and powder samples at room temperature at 3.34 T. The large ($>100$-fold) enhancements demonstrated correspond to a greater than 10,000 fold improvement in terms of signal averaging of the 1\% abundant $^{13}$C spins. The DNP was performed using low-power solid state sources under static (non-spinning) conditions. The DNP spectrum (DNP enhancement as a function of microwave frequency) of diamond powder shows features that broadly correlate with the EPR spectrum. A well-defined negative Overhauser peak and two solid effect peaks are observed for the central ($m_I=0$) manifold of the $^{14}$N spins. Previous low temperature measurements in diamond had measured a positive Overhauser enhancement in this manifold. Frequency-chirped millimeter-wave excitation of the electron spins is seen to significantly improve the enhancements for the two outer nuclear spin manifolds ($m_I = \pm 1$) and to blur some of the sharper features associated with the central manifolds. The outer lines are best fit using a combination of the cross effect and a truncated cross effect -- which is known to mimic features of an Overhauser effect. Similar features are also observed in experiments on single crystal samples. The observation of all of these mechanisms in a single material system under the same experimental conditions is likely due to the significant heterogeneity of the high pressure, high temperature (HPHT) type Ib diamond samples used. Large room temperature DNP enhancements at fields above a few Tesla enable spectroscopic studies with better chemical shift resolution under ambient conditions. △ Less

Submitted 25 July, 2022; originally announced July 2022.

arXiv:2203.13179 [pdf, other]

Automatic User Profiling in Darknet Markets: a Scalability Study

Authors: Claudia Peersman, Matthew Edwards, Emma Williams, Awais Rashid

Abstract: In this study, we investigate the scalability of state-of-the-art user profiling technologies across different online domains. More specifically, this work aims to understand the reliability and limitations of current computational stylometry approaches when these are applied to underground fora in which user populations potentially differ from other online platforms (predominantly male, younger a… ▽ More In this study, we investigate the scalability of state-of-the-art user profiling technologies across different online domains. More specifically, this work aims to understand the reliability and limitations of current computational stylometry approaches when these are applied to underground fora in which user populations potentially differ from other online platforms (predominantly male, younger age and greater computer use) and cyber offenders who attempt to hide their identity. Because no ground truth is available and no validated criminal data from historic investigations is available for validation purposes, we have collected new data from clearweb forums that do include user demographics and could be more closely related to underground fora in terms of user population (e.g., tech communities) than commonly used social media benchmark datasets showing a more balanced user population. △ Less

Submitted 24 March, 2022; originally announced March 2022.

arXiv:2203.08642 [pdf, other]

Understanding motivations and characteristics of financially-motivated cybercriminals

Authors: Claudia Peersman, Emma Williams, Matthew Edwards, Awais Rashid

Abstract: Background: Cyber offences, such as hacking, malware creation and distribution, and online fraud, present a substantial threat to organizations attempting to safeguard their data and information. By understanding the evolving characteristics and motivations of individuals involved in these activities, and the threats that they may pose, cyber security practitioners will be better placed to underst… ▽ More Background: Cyber offences, such as hacking, malware creation and distribution, and online fraud, present a substantial threat to organizations attempting to safeguard their data and information. By understanding the evolving characteristics and motivations of individuals involved in these activities, and the threats that they may pose, cyber security practitioners will be better placed to understand and assess current threats to their systems and the range of socio-technical mitigations that may best reduce these. Aim: The reported work-in-progress aims to explore the extent to which findings from prior academic literature regarding the characteristics and motivations of offenders engaging in financially-motivated, cyber-dependent crime are supported by the contemporary experiences and perspectives of practitioners currently working in the cyber crime field. Method: A targeted, online survey was developed consisting of both closed and open-ended questions relating to current cyber threats and the characteristics and motivations of offenders engaged in these activities. Sixteen practitioners working in law enforcement-related domains in the cyber crime field completed the survey, providing a combination of qualitative and quantitative data for analysis. △ Less

Submitted 28 March, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

arXiv:2203.00720 [pdf, other]

doi 10.3847/1538-4357/ac39a4

The Long Tails of the Pegasus-Pisces Arch Intermediate Velocity Cloud

Authors: R. L. Shelton, M. E. Williams, M. C. Parker, J. E. Galyardt, Y. Fukui, K. Tachihara

Abstract: We present hydrodynamic simulations of the Pegasus-Pisces (PP Arch), an intermediate velocity cloud in our Galaxy. The PP Arch, also known as IVC 86-36, is unique among intermediate and high velocity clouds, because its twin tails are unusually long and narrow. Its -50 km/s line-of-sight velocity qualifies it as an intermediate velocity cloud, but the tails' orientations indicate that the cloud's… ▽ More We present hydrodynamic simulations of the Pegasus-Pisces (PP Arch), an intermediate velocity cloud in our Galaxy. The PP Arch, also known as IVC 86-36, is unique among intermediate and high velocity clouds, because its twin tails are unusually long and narrow. Its -50 km/s line-of-sight velocity qualifies it as an intermediate velocity cloud, but the tails' orientations indicate that the cloud's total three-dimensional speed is at least ~100 km/s. This speed is supersonic in the Reynold's Layer and thick disk. We simulated the cloud as it travels supersonically through the Galactic thick and thin disks at an oblique angle relative to the midplane. Our simulated clouds grow long double tails and reasonably reproduce the H I 21~cm intensity and velocity of the head of the PP Arch. A bow shock protects each simulated cloud from excessive shear and lowers its Reynolds number. These factors may similarly protect the PP Arch and enable the survival of its unusually long tails. The simulations predict the future hydrodynamic behavior of the cloud when it collides with denser gas nearer to the Galactic midplane. It appears that the PP Arch's fate is to deform, dissipate, and merge with the Galactic disk. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: 17 pages in this format although it is only 15 in the ApJ; 6 multi-panel figures; recently published in the Astrophysical Journal

Journal ref: ApJ 925 190 (2022)

arXiv:2202.07419 [pdf, other]

Characterising Cybercriminals: A Review

Authors: Matthew Edwards, Emma Williams, Claudia Peersman, Awais Rashid

Abstract: This review provides an overview of current research on the known characteristics and motivations of offenders engaging in cyber-dependent crimes. Due to the shifting dynamics of cybercriminal behaviour, and the availability of prior reviews in 2013, this review focuses on original research conducted from 2012 onwards, although some older studies that were not included in prior reviews are also co… ▽ More This review provides an overview of current research on the known characteristics and motivations of offenders engaging in cyber-dependent crimes. Due to the shifting dynamics of cybercriminal behaviour, and the availability of prior reviews in 2013, this review focuses on original research conducted from 2012 onwards, although some older studies that were not included in prior reviews are also considered. As a basis for interpretation of results, a limited quality assessment was also carried out on included studies through examination of key indicators. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:2202.05372 [pdf]

doi 10.3847/1538-4357/ac37ba

Hydrodynamics of Clustered Clouds: Drafting, Survival, Condensation, and Ablation

Authors: M. Elliott Williams, Robin L. Shelton

Abstract: For et al., who catalogued Magellanic Stream (MS) clouds, suggested that there is substantial large-scale turbulence in the MS. Here we follow up with a series of FLASH simulations that model the hydrodynamic effects that clouds have on each other. The suite of simulations includes a range of cloud separation distances and densities. The ambient conditions are similar to those surrounding the MS b… ▽ More For et al., who catalogued Magellanic Stream (MS) clouds, suggested that there is substantial large-scale turbulence in the MS. Here we follow up with a series of FLASH simulations that model the hydrodynamic effects that clouds have on each other. The suite of simulations includes a range of cloud separation distances and densities. The ambient conditions are similar to those surrounding the MS but also relevant to the circumgalactic medium and intergalactic medium. Ten simulations are presented, eight of which model clustered clouds and two of which model isolated clouds. The isolated clouds are used as controls for comparison with the multicloud simulations. We find that if the clouds are initially near each other, then hydrodynamical drafting helps the trailing cloud to catch the leading cloud and mix together. We present the measured acceleration due to drafting and find that lower-density clouds in lower-density environments experience more acceleration due to drafting than their denser cohorts. We find that the clustering of clouds also increases the condensation of ambient material and affects longevity. We analyze the velocity dispersion of the clouds using a single component method and a multicomponent decomposition method. We find that the presence of a second cloud increases the velocity dispersion behind the trailing cloud at some times. We find that the velocity dispersion due to gas motion in our simulations is significantly less than the actual dispersion observed by For et al., indicating that the thermal component must dominate in the MS. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Journal ref: The Astrophysical Journal 926 (2022) 36-51

arXiv:2201.07789 [pdf, other]

doi 10.1140/epjqt/s40507-022-00147-w

Cold Atoms in Space: Community Workshop Summary and Proposed Road-Map

Authors: Ivan Alonso, Cristiano Alpigiani, Brett Altschul, Henrique Araujo, Gianluigi Arduini, Jan Arlt, Leonardo Badurina, Antun Balaz, Satvika Bandarupally, Barry C Barish Michele Barone, Michele Barsanti, Steven Bass, Angelo Bassi, Baptiste Battelier, Charles F. A. Baynham, Quentin Beaufils, Aleksandar Belic, Joel Berge, Jose Bernabeu, Andrea Bertoldi, Robert Bingham, Sebastien Bize, Diego Blas, Kai Bongs, Philippe Bouyer , et al. (224 additional authors not shown)

Abstract: We summarize the discussions at a virtual Community Workshop on Cold Atoms in Space concerning the status of cold atom technologies, the prospective scientific and societal opportunities offered by their deployment in space, and the developments needed before cold atoms could be operated in space. The cold atom technologies discussed include atomic clocks, quantum gravimeters and accelerometers, a… ▽ More We summarize the discussions at a virtual Community Workshop on Cold Atoms in Space concerning the status of cold atom technologies, the prospective scientific and societal opportunities offered by their deployment in space, and the developments needed before cold atoms could be operated in space. The cold atom technologies discussed include atomic clocks, quantum gravimeters and accelerometers, and atom interferometers. Prospective applications include metrology, geodesy and measurement of terrestrial mass change due to, e.g., climate change, and fundamental science experiments such as tests of the equivalence principle, searches for dark matter, measurements of gravitational waves and tests of quantum mechanics. We review the current status of cold atom technologies and outline the requirements for their space qualification, including the development paths and the corresponding technical milestones, and identifying possible pathfinder missions to pave the way for missions to exploit the full potential of cold atoms in space. Finally, we present a first draft of a possible road-map for achieving these goals, that we propose for discussion by the interested cold atom, Earth Observation, fundamental physics and other prospective scientific user communities, together with ESA and national space and research funding agencies. △ Less

Submitted 19 January, 2022; originally announced January 2022.

Comments: Summary of the Community Workshop on Cold Atoms in Space and corresponding Road-map: https://indico.cern.ch/event/1064855/

Journal ref: EPJ Quantum Technol. 9, 30 (2022)

arXiv:2112.03277 [pdf]

Automatic quality control framework for more reliable integration of machine learning-based image segmentation into medical workflows

Authors: Elena Williams, Sebastian Niehaus, Janis Reinelt, Alberto Merola, Paul Glad Mihai, Kersten Villringer, Konstantin Thierbach, Evelyn Medawar, Daniel Lichterfeld, Ingo Roeder, Nico Scherf, Maria del C. Valdés Hernández

Abstract: Machine learning algorithms underpin modern diagnostic-aiding software, which has proved valuable in clinical practice, particularly in radiology. However, inaccuracies, mainly due to the limited availability of clinical samples for training these algorithms, hamper their wider applicability, acceptance, and recognition amongst clinicians. We present an analysis of state-of-the-art automatic quali… ▽ More Machine learning algorithms underpin modern diagnostic-aiding software, which has proved valuable in clinical practice, particularly in radiology. However, inaccuracies, mainly due to the limited availability of clinical samples for training these algorithms, hamper their wider applicability, acceptance, and recognition amongst clinicians. We present an analysis of state-of-the-art automatic quality control (QC) approaches that can be implemented within these algorithms to estimate the certainty of their outputs. We validated the most promising approaches on a brain image segmentation task identifying white matter hyperintensities (WMH) in magnetic resonance imaging data. WMH are a correlate of small vessel disease common in mid-to-late adulthood and are particularly challenging to segment due to their varied size, and distributional patterns. Our results show that the aggregation of uncertainty and Dice prediction were most effective in failure detection for this task. Both methods independently improved mean Dice from 0.82 to 0.84. Our work reveals how QC methods can help to detect failed segmentation cases and therefore make automatic segmentation more reliable and suitable for clinical practice. △ Less

Submitted 19 December, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

Comments: 19 pages

arXiv:2111.09650 [pdf]

Whole Heart Anatomical Refinement from CCTA using Extrapolation and Parcellation

Authors: Hao Xu, Steven A. Niederer, Steven E. Williams, David E. Newby, Michelle C. Williams, Alistair A. Young

Abstract: Coronary computed tomography angiography (CCTA) provides detailed an-atomical information on all chambers of the heart. Existing segmentation tools can label the gross anatomy, but addition of application-specific labels can require detailed and often manual refinement. We developed a U-Net based framework to i) extrapolate a new label from existing labels, and ii) parcellate one label into multip… ▽ More Coronary computed tomography angiography (CCTA) provides detailed an-atomical information on all chambers of the heart. Existing segmentation tools can label the gross anatomy, but addition of application-specific labels can require detailed and often manual refinement. We developed a U-Net based framework to i) extrapolate a new label from existing labels, and ii) parcellate one label into multiple labels, both using label-to-label map**, to create a desired segmentation that could then be learnt directly from the image (image- to-label map**). This approach only required manual correction in a small subset of cases (80 for extrapolation, 50 for parcella-tion, compared with 260 for initial labels). An initial 6-label segmentation (left ventricle, left ventricular myocardium, right ventricle, left atrium, right atrium and aorta) was refined to a 10-label segmentation that added a label for the pulmonary artery and divided the left atrium label into body, left and right veins and appendage components. The final method was tested using 30 cases, 10 each from Philips, Siemens and Toshiba scanners. In addition to the new labels, the median Dice scores were improved for all the initial 6 labels to be above 95% in the 10-label segmentation, e.g. from 91% to 97% for the left atrium body and from 92% to 96% for the right ventricle. This method provides a simple framework for flexible refinement of anatomical labels. The code and executables are available at cemrg.com. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: 9 pages, 5 figures, presented at Functional Imaging and Modeling of the Heart 2021

arXiv:2110.14275 [pdf, other]

doi 10.1093/mnras/stab3158

Study of Chemically Peculiar Stars-I : High-resolution Spectroscopy and K2 Photometry of Am Stars in the Region of M44

Authors: Santosh Joshi, Otto Trust, E. Semenko, P. E. Williams, P. Lampens, P. De Cat, L. Vermeylen, D. L. Holdsworth, R. A. García, S. Mathur, A. R. G. Santos, D. Mkrtichian, A. Goswami, M. Cuntz, A. P. Yadav, M. Sarkar, B. C. Bhatt, F. Kahraman Aliçavuş, M. D. Nhlapo, M. N. Lund, P. P. Goswami, I. Savanov, A. Jorissen, E. Jurua, E. Avvakumova , et al. (8 additional authors not shown)

Abstract: We present a study based on the high-resolution spectroscopy and K2 space photometry of five chemically peculiar stars in the region of the open cluster M44. The analysis of the high-precision photometric K2 data reveals that the light variations in HD 73045 and HD 76310 are rotational in nature and caused by spots or cloud-like co-rotating structures, which are non-stationary and short-lived. The… ▽ More We present a study based on the high-resolution spectroscopy and K2 space photometry of five chemically peculiar stars in the region of the open cluster M44. The analysis of the high-precision photometric K2 data reveals that the light variations in HD 73045 and HD 76310 are rotational in nature and caused by spots or cloud-like co-rotating structures, which are non-stationary and short-lived. The time-resolved radial velocity measurements, in combination with the K2 photometry, confirm that HD 73045 does not show any periodic variability on timescales shorter than 1.3 d, contrary to previous reports in the literature. In addition to these new rotational variables, we discovered a new heartbeat system, HD 73619, where no pulsational signatures are seen. The spectroscopic and spectropolarimetric analyses indicate that HD 73619 belongs to the peculiar Am class, with either a weak or no magnetic field considering the 200 G detection limit of our study. The Least-Squares Deconvolution (LSD) profiles for HD 76310 indicate a complex structure in its spectra suggesting that this star is either part of a binary system or surrounded by a cloud shell. When placed in the Hertzsprung-Russell diagram, all studied stars are evolved from main-sequence and situated in the $δ$ Scuti instability strip. The present work is relevant for further detailed studies of CP stars, such as inhomogeneities (including spots) in the absence of magnetic fields and the origin of the pulsational variability in heartbeat systems. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: Accepted for publication in MNRAS

arXiv:2108.08889 [pdf, other]

doi 10.3847/1538-3881/ac0e2f

K2, Spitzer, and TESS Transits of Four Sub-Neptune Exoplanets

Authors: Alison Duck, Caleb K. Harada, Justin Harrell, Ryan R. A. Morris, Edward Williams, Ian Crossfield, Michael Werner, Drake Deming

Abstract: We present new Spitzer transit observations of four K2 transiting sub-Neptunes: K2-36c, K2-79b, K2-167b, and K2-212b. We derive updated orbital ephemerides and radii for these planets based on a joint analysis of the Spitzer, TESS, and K2 photometry. We use the EVEREST pipeline to provide improved K2 photometry, by detrending instrumental noise and K2's pointing jitter. We used a pixel level decor… ▽ More We present new Spitzer transit observations of four K2 transiting sub-Neptunes: K2-36c, K2-79b, K2-167b, and K2-212b. We derive updated orbital ephemerides and radii for these planets based on a joint analysis of the Spitzer, TESS, and K2 photometry. We use the EVEREST pipeline to provide improved K2 photometry, by detrending instrumental noise and K2's pointing jitter. We used a pixel level decorrelation method on the Spitzer observations to reduce instrumental systematic effects. We modeled the effect of possible blended eclipsing binaries, seeking to validate these planets via the achromaticity of the transits (K2 versus Spitzer). However, we find that Spitzer's signal-to-noise ratio for these small planets is insufficient to validate them via achromaticity. Nevertheless, by jointly fitting radii between K2 and Spitzer observations, we were able to independently confirm the K2 radius measurements. Due to the long time baseline between the K2 and Spitzer observations, we were also able to increase the precision of the orbital periods compared to K2 observations alone. The improvement is a factor of 3 for K2-36c, and more than an order of magnitude for the remaining planets. Considering possible JWST observations in 1/2023, previous 1 sigma uncertainties in transit times for these planets range from 74 to 434 minutes, but we have reduced them to the range of 8 to 23 minutes. △ Less

Submitted 19 August, 2021; originally announced August 2021.

Comments: 18 pages, 11 Figures, Accepted by The Astronomical Journal

Journal ref: AJ 162 136 (2021)

arXiv:2107.05684 [pdf, other]

Accenture at CheckThat! 2021: Interesting claim identification and ranking with contextually sensitive lexical training data augmentation

Authors: Evan Williams, Paul Rodrigues, Sieu Tran

Abstract: This paper discusses the approach used by the Accenture Team for CLEF2021 CheckThat! Lab, Task 1, to identify whether a claim made in social media would be interesting to a wide audience and should be fact-checked. Twitter training and test data were provided in English, Arabic, Spanish, Turkish, and Bulgarian. Claims were to be classified (check-worthy/not check-worthy) and ranked in priority ord… ▽ More This paper discusses the approach used by the Accenture Team for CLEF2021 CheckThat! Lab, Task 1, to identify whether a claim made in social media would be interesting to a wide audience and should be fact-checked. Twitter training and test data were provided in English, Arabic, Spanish, Turkish, and Bulgarian. Claims were to be classified (check-worthy/not check-worthy) and ranked in priority order for the fact-checker. Our method used deep neural network transformer models with contextually sensitive lexical augmentation applied on the supplied training datasets to create additional training samples. This augmentation approach improved the performance for all languages. Overall, our architecture and data augmentation pipeline produced the best submitted system for Arabic, and performance scales according to the quantity of provided training data for English, Spanish, Turkish, and Bulgarian. This paper investigates the deep neural network architectures for each language as well as the provided data to examine why the approach worked so effectively for Arabic, and discusses additional data augmentation measures that should could be useful to this problem. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Comments: To Appear As: Evan Williams, Paul Rodrigues, Sieu Tran. Accenture at CheckThat! 2021: Interesting claim identification and ranking with contextually sensitive lexical training data augmentation. In: Faggioli et al. Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum. Bucharest, Romania. 21-24 September 2021

arXiv:2101.10769 [pdf]

Regression Models for Order-of-Addition Experiments

Authors: Hans-Peter Piepho, Emlyn R. Williams

Abstract: The purpose of order-of-addition (OofA) experiments is to identify the best order in a sequence of m components in a system or treatment. Such experiments may be analysed by various regression models, the most popular ones being based on pairwise ordering (PWO) factors or on component-position (CP) factors. This paper reviews these models and extensions and proposes a new class of models based on… ▽ More The purpose of order-of-addition (OofA) experiments is to identify the best order in a sequence of m components in a system or treatment. Such experiments may be analysed by various regression models, the most popular ones being based on pairwise ordering (PWO) factors or on component-position (CP) factors. This paper reviews these models and extensions and proposes a new class of models based on response surface (RS) regression using component position numbers as predictor variables. Using two published examples, it is shown that RS models can be quite competitive. In case of model uncertainty, we advocate the use of model averaging for analysis. The averaging idea leads naturally to a design approach based on a compound optimality criterion assigning weights to each candidate model. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: 25 pages, 7 Tables, 1 Figure

arXiv:2011.11570 [pdf, other]

doi 10.1109/CDC42340.2020.9304378

Direct Transcription for Dynamic Optimization: A Tutorial with a Case Study on Dual-Patient Ventilation During the COVID-19 Pandemic

Authors: Eric C. Kerrigan, Yuanbo Nie, Omar Faqir, Caroline H. Kennedy, Steven A. Niederer, Jose A. Solis-Lemus, Peter Vincent, Steven E. Williams

Abstract: A variety of optimal control, estimation, system identification and design problems can be formulated as functional optimization problems with differential equality and inequality constraints. Since these problems are infinite-dimensional and often do not have a known analytical solution, one has to resort to numerical methods to compute an approximate solution. This paper uses a unifying notation… ▽ More A variety of optimal control, estimation, system identification and design problems can be formulated as functional optimization problems with differential equality and inequality constraints. Since these problems are infinite-dimensional and often do not have a known analytical solution, one has to resort to numerical methods to compute an approximate solution. This paper uses a unifying notation to outline some of the techniques used in the transcription step of simultaneous direct methods (which discretize-then-optimize) for solving continuous-time dynamic optimization problems. We focus on collocation, integrated residual and Runge-Kutta schemes. These transcription methods are then applied to a simulation case study to answer a question that arose during the COVID-19 pandemic, namely: If there are not enough ventilators, is it possible to ventilate more than one patient on a single ventilator? The results suggest that it is possible, in principle, to estimate individual patient parameters sufficiently accurately, using a relatively small number of flow rate measurements, without needing to disconnect a patient from the system or needing more than one flow rate sensor. We also show that it is possible to ensure that two different patients can indeed receive their desired tidal volume, by modifying the resistance experienced by the air flow to each patient and controlling the ventilator pressure. △ Less

Submitted 23 November, 2020; originally announced November 2020.

Comments: Accepted to 59th IEEE Conference on Decision and Control, Jeju Island, Republic of Korea, December 14th-18th 2020

Journal ref: 2020 59th IEEE Conference on Decision and Control (CDC)

arXiv:2009.06938 [pdf, other]

doi 10.23731/CYRM-2020-008

A primary electron beam facility at CERN -- eSPS Conceptual design report

Authors: M. Aicheler, T. Akesson, F. Antoniou, A. Arnalich, P. A. Arrutia Sota, P. Bettencourt Moniz Cabral, D. Bozzini, M. Brugger, O. Brunner, P. N. Burrows, R. Calaga, M. J. Capstick, R. Corsini, S. Doebert, L. A. Dougherty, Y. Dutheil, L. A. Dyks, O. Etisken, L. Evans, A. Farricker, R. Fernandez Ortega, M. A. Fraser, J. Gall, S. J. Gessner, B. Goddard , et al. (30 additional authors not shown)

Abstract: The design of a primary electron beam facility at CERN is described. The study has been carried out within the framework of the wider Physics Beyond Colliders study. It re-enables the Super Proton Synchrotron (SPS) as an electron accelerator, and leverages the development invested in Compact Linear Collider (CLIC) technology for its injector and as an accelerator research and development infrastru… ▽ More The design of a primary electron beam facility at CERN is described. The study has been carried out within the framework of the wider Physics Beyond Colliders study. It re-enables the Super Proton Synchrotron (SPS) as an electron accelerator, and leverages the development invested in Compact Linear Collider (CLIC) technology for its injector and as an accelerator research and development infrastructure. The facility would be relevant for several of the key priorities in the 2020 update of the European Strategy for Particle Physics, such as an electron-positron Higgs factory, accelerator R\&D, dark sector physics, and neutrino physics. In addition, it could serve experiments in nuclear physics. The electron beam delivered by this facility would provide access to light dark matter production significantly beyond the targets predicted by a thermal dark matter origin, and for natures of dark matter particles that are not accessible by direct detection experiments. It would also enable electro-nuclear measurements crucial for precise modelling the energy dependence of neutrino-nucleus interactions, which is needed to precisely measure neutrino oscillations as a function of energy. The implementation of the facility is the natural next step in the development of X-band high-gradient acceleration technology, a key technology for compact and cost-effective electron/positron linacs. It would also become the only facility with multi-GeV drive bunches and truly independent electron witness bunches for plasma wakefield acceleration. A second phase capable to deliver positron witness bunches would make it a complete facility for plasma wakefield collider studies. [...] △ Less

Submitted 21 December, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

arXiv:2009.02431 [pdf, ps, other]

Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models

Authors: Evan Williams, Paul Rodrigues, Valerie Novak

Abstract: We introduce the strategies used by the Accenture Team for the CLEF2020 CheckThat! Lab, Task 1, on English and Arabic. This shared task evaluated whether a claim in social media text should be professionally fact checked. To a journalist, a statement presented as fact, which would be of interest to a large audience, requires professional fact-checking before dissemination. We utilized BERT and RoB… ▽ More We introduce the strategies used by the Accenture Team for the CLEF2020 CheckThat! Lab, Task 1, on English and Arabic. This shared task evaluated whether a claim in social media text should be professionally fact checked. To a journalist, a statement presented as fact, which would be of interest to a large audience, requires professional fact-checking before dissemination. We utilized BERT and RoBERTa models to identify claims in social media text a professional fact-checker should review, and rank these in priority order for the fact-checker. For the English challenge, we fine-tuned a RoBERTa model and added an extra mean pooling layer and a dropout layer to enhance generalizability to unseen text. For the Arabic task, we fine-tuned Arabic-language BERT models and demonstrate the use of back-translation to amplify the minority class and balance the dataset. The work presented here was scored 1st place in the English track, and 1st, 2nd, 3rd, and 4th place in the Arabic track. △ Less

Submitted 4 September, 2020; originally announced September 2020.

Comments: To Appear As: Evan Williams, Paul Rodrigues, Valerie Novak. Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models. In: Cappellato et al. Working Notes of CLEF 2020-Conference and Labs of the Evaluation Forum. Thessaloniki, Greece. 22-25 September 2020

Showing 1–50 of 249 results for author: Williams, E