-
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
Authors:
Alejandro Lozano,
Jeffrey Nirschl,
James Burgess,
Sanket Rajan Gupte,
Yuhui Zhang,
Alyssa Unell,
Serena Yeung-Levy
Abstract:
Recent advances in microscopy have enabled the rapid generation of terabytes of image data in cell biology and biomedical research. Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis, enhancing researchers' efficiency, identifying new image biomarkers, and accelerating hypothesis generation and scientific discovery. However, there is a lack of standa…
▽ More
Recent advances in microscopy have enabled the rapid generation of terabytes of image data in cell biology and biomedical research. Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis, enhancing researchers' efficiency, identifying new image biomarkers, and accelerating hypothesis generation and scientific discovery. However, there is a lack of standardized, diverse, and large-scale vision-language benchmarks to evaluate VLMs' perception and cognition capabilities in biological image understanding. To address this gap, we introduce μ-Bench, an expert-curated benchmark encompassing 22 biomedical tasks across various scientific disciplines (biology, pathology), microscopy modalities (electron, fluorescence, light), scales (subcellular, cellular, tissue), and organisms in both normal and abnormal states. We evaluate state-of-the-art biomedical, pathology, and general VLMs on μ-Bench and find that: i) current models struggle on all categories, even for basic tasks such as distinguishing microscopy modalities; ii) current specialist models fine-tuned on biomedical data often perform worse than generalist models; iii) fine-tuning in specific microscopy domains can cause catastrophic forgetting, eroding prior biomedical knowledge encoded in their base model. iv) weight interpolation between fine-tuned and pre-trained models offers one solution to forgetting and improves general performance across biomedical tasks. We release μ-Bench under a permissive license to accelerate the research and development of microscopy foundation models.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Measuring Misogyny in Natural Language Generation: Preliminary Results from a Case Study on two Reddit Communities
Authors:
Aaron J. Snoswell,
Lucinda Nelson,
Hao Xue,
Flora D. Salim,
Nicolas Suzor,
Jean Burgess
Abstract:
Generic `toxicity' classifiers continue to be used for evaluating the potential for harm in natural language generation, despite mounting evidence of their shortcomings. We consider the challenge of measuring misogyny in natural language generation, and argue that generic `toxicity' classifiers are inadequate for this task. We use data from two well-characterised `Incel' communities on Reddit that…
▽ More
Generic `toxicity' classifiers continue to be used for evaluating the potential for harm in natural language generation, despite mounting evidence of their shortcomings. We consider the challenge of measuring misogyny in natural language generation, and argue that generic `toxicity' classifiers are inadequate for this task. We use data from two well-characterised `Incel' communities on Reddit that differ primarily in their degrees of misogyny to construct a pair of training corpora which we use to fine-tune two language models. We show that an open source `toxicity' classifier is unable to distinguish meaningfully between generations from these models. We contrast this with a misogyny-specific lexicon recently proposed by feminist subject-matter experts, demonstrating that, despite the limitations of simple lexicon-based approaches, this shows promise as a benchmark to evaluate language models for misogyny, and that it is sensitive enough to reveal the known differences in these Reddit communities. Our preliminary findings highlight the limitations of a generic approach to evaluating harms, and further emphasise the need for careful benchmark design and selection in natural language evaluation.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models
Authors:
James Burgess,
Kuan-Chieh Wang,
Serena Yeung
Abstract:
Text-to-image diffusion models understand spatial relationship between objects, but do they represent the true 3D structure of the world from only 2D supervision? We demonstrate that yes, 3D knowledge is encoded in 2D image diffusion models like Stable Diffusion, and we show that this structure can be exploited for 3D vision tasks. Our method, Viewpoint Neural Textual Inversion (ViewNeTI), control…
▽ More
Text-to-image diffusion models understand spatial relationship between objects, but do they represent the true 3D structure of the world from only 2D supervision? We demonstrate that yes, 3D knowledge is encoded in 2D image diffusion models like Stable Diffusion, and we show that this structure can be exploited for 3D vision tasks. Our method, Viewpoint Neural Textual Inversion (ViewNeTI), controls the 3D viewpoint of objects in generated images from frozen diffusion models. We train a small neural mapper to take camera viewpoint parameters and predict text encoder latents; the latents then condition the diffusion generation process to produce images with the desired camera viewpoint.
ViewNeTI naturally addresses Novel View Synthesis (NVS). By leveraging the frozen diffusion model as a prior, we can solve NVS with very few input views; we can even do single-view novel view synthesis. Our single-view NVS predictions have good semantic details and photorealism compared to prior methods. Our approach is well suited for modeling the uncertainty inherent in sparse 3D vision problems because it can efficiently generate diverse samples. Our view-control mechanism is general, and can even change the camera view in images generated by user-defined prompts.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Malware and Exploits on the Dark Web
Authors:
Jonah Burgess
Abstract:
In recent years, the darknet has become the key location for the distribution of malware and exploits. We have seen scenarios where software vulnerabilities have been disclosed by vendors and shortly after, operational exploits are available on darknet forums and marketplaces. Many marketplace vendors offer zero-day exploits that have not yet been discovered or disclosed. This trend has led to sec…
▽ More
In recent years, the darknet has become the key location for the distribution of malware and exploits. We have seen scenarios where software vulnerabilities have been disclosed by vendors and shortly after, operational exploits are available on darknet forums and marketplaces. Many marketplace vendors offer zero-day exploits that have not yet been discovered or disclosed. This trend has led to security companies offering darknet analysis services to detect new exploits and malware, providing proactive threat intelligence. This paper presents information on the scale of malware distribution, the trends of malware types offered, the methods for discovering new exploits and the effectiveness of darknet analysis in detecting malware at the earliest possible stage.
△ Less
Submitted 30 September, 2022;
originally announced November 2022.
-
Modern DDoS Attacks and Defences -- Survey
Authors:
Jonah Burgess
Abstract:
Denial of Service (DoS) and Distributed Denial of Service of Service (DDoS) attacks are commonly used to disrupt network services. Attack techniques are always improving and due to the structure of the internet and properties of network protocols it is difficult to keep detection and mitigation techniques up to date. A lot of research has been conducted in this area which has demonstrated the diff…
▽ More
Denial of Service (DoS) and Distributed Denial of Service of Service (DDoS) attacks are commonly used to disrupt network services. Attack techniques are always improving and due to the structure of the internet and properties of network protocols it is difficult to keep detection and mitigation techniques up to date. A lot of research has been conducted in this area which has demonstrated the difficulty of preventing DDoS attacks altogether, therefore the primary aim of most research is to maximize quality of service (QoS) for legitimate users. This survey paper aims to provide a clear summary of DDoS attacks and focuses on some recently proposed techniques for defence. The research papers that are analysed in depth primarily focused on the use of virtual machines (VMs) (HoneyMesh) and network function virtualization (NFV) (VGuard and VFence).
△ Less
Submitted 30 September, 2022;
originally announced November 2022.
-
Integrated Weak Learning
Authors:
Peter Hayes,
Mingtian Zhang,
Raza Habib,
Jordan Burgess,
Emine Yilmaz,
David Barber
Abstract:
We introduce Integrated Weak Learning, a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple sources of weak supervision. We introduce a label model that can learn to aggregate weak supervision sources differently for different datapoints and takes into consi…
▽ More
We introduce Integrated Weak Learning, a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple sources of weak supervision. We introduce a label model that can learn to aggregate weak supervision sources differently for different datapoints and takes into consideration the performance of the end-model during training. We show that our approach outperforms existing weak learning techniques across a set of 6 benchmark classification datasets. When both a small amount of labeled data and weak supervision are present the increase in performance is both consistent and large, reliably getting a 2-5 point test F1 score gain over non-integrated methods.
△ Less
Submitted 19 June, 2022;
originally announced June 2022.
-
Sample Efficient Model Evaluation
Authors:
Emine Yilmaz,
Peter Hayes,
Raza Habib,
Jordan Burgess,
David Barber
Abstract:
Labelling data is a major practical bottleneck in training and testing classifiers. Given a collection of unlabelled data points, we address how to select which subset to label to best estimate test metrics such as accuracy, $F_1$ score or micro/macro $F_1$. We consider two sampling based approaches, namely the well-known Importance Sampling and we introduce a novel application of Poisson Sampling…
▽ More
Labelling data is a major practical bottleneck in training and testing classifiers. Given a collection of unlabelled data points, we address how to select which subset to label to best estimate test metrics such as accuracy, $F_1$ score or micro/macro $F_1$. We consider two sampling based approaches, namely the well-known Importance Sampling and we introduce a novel application of Poisson Sampling. For both approaches we derive the minimal error sampling distributions and how to approximate and use them to form estimators and confidence intervals. We show that Poisson Sampling outperforms Importance Sampling both theoretically and experimentally.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
A Semantic Framework for Enabling Radio Spectrum Policy Management and Evaluation
Authors:
H. Santos,
A. Mulvehill,
J. S. Erickson,
J. P. McCusker,
M. Gordon,
O. Xie,
S. Stouffer,
G. Capraro,
A. Pidwerbetsky,
J. Burgess,
A. Berlinsky,
K. Turck,
J. Ashdown,
D. L. McGuinness
Abstract:
Because radio spectrum is a finite resource, its usage and sharing is regulated by government agencies. These agencies define policies to manage spectrum allocation and assignment across multiple organizations, systems, and devices. With more portions of the radio spectrum being licensed for commercial use, the importance of providing an increased level of automation when evaluating such policies…
▽ More
Because radio spectrum is a finite resource, its usage and sharing is regulated by government agencies. These agencies define policies to manage spectrum allocation and assignment across multiple organizations, systems, and devices. With more portions of the radio spectrum being licensed for commercial use, the importance of providing an increased level of automation when evaluating such policies becomes crucial for the efficiency and efficacy of spectrum management. We introduce our Dynamic Spectrum Access Policy Framework for supporting the United States government's mission to enable both federal and non-federal entities to compatibly utilize available spectrum. The DSA Policy Framework acts as a machine-readable policy repository providing policy management features and spectrum access request evaluation. The framework utilizes a novel policy representation using OWL and PROV-O along with a domain-specific reasoning implementation that mixes GeoSPARQL, OWL reasoning, and knowledge graph traversal to evaluate incoming spectrum access requests and explain how applicable policies were used. The framework is currently being used to support live, over-the-air field exercises involving a diverse set of federal and commercial radios, as a component of a prototype spectrum management system.
△ Less
Submitted 8 November, 2020;
originally announced November 2020.
-
One-Shot Learning in Discriminative Neural Networks
Authors:
Jordan Burgess,
James Robert Lloyd,
Zoubin Ghahramani
Abstract:
We consider the task of one-shot learning of visual categories. In this paper we explore a Bayesian procedure for updating a pretrained convnet to classify a novel image category for which data is limited. We decompose this convnet into a fixed feature extractor and softmax classifier. We assume that the target weights for the new task come from the same distribution as the pretrained softmax weig…
▽ More
We consider the task of one-shot learning of visual categories. In this paper we explore a Bayesian procedure for updating a pretrained convnet to classify a novel image category for which data is limited. We decompose this convnet into a fixed feature extractor and softmax classifier. We assume that the target weights for the new task come from the same distribution as the pretrained softmax weights, which we model as a multivariate Gaussian. By using this as a prior for the new weights, we demonstrate competitive performance with state-of-the-art methods whilst also being consistent with 'normal' methods for training deep networks on large data.
△ Less
Submitted 18 July, 2017;
originally announced July 2017.