-
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Authors:
Bertie Vidgen,
Adarsh Agrawal,
Ahmed M. Ahmed,
Victor Akinwande,
Namir Al-Nuaimi,
Najla Alfaraj,
Elie Alhajjar,
Lora Aroyo,
Trupti Bavalatti,
Max Bartolo,
Borhane Blili-Hamelin,
Kurt Bollacker,
Rishi Bomassani,
Marisa Ferrara Boston,
Siméon Campos,
Kal Chakra,
Canyu Chen,
Cody Coleman,
Zacharie Delpierre Coudert,
Leon Derczynski,
Debojyoti Dutta,
Ian Eisenberg,
James Ezick,
Heather Frase,
Brian Fuller
, et al. (75 additional authors not shown)
Abstract:
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu…
▽ More
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.
△ Less
Submitted 13 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
One Category One Prompt: Dataset Distillation using Diffusion Models
Authors:
Ali Abbasi,
Ashkan Shahbazi,
Hamed Pirsiavash,
Soheil Kolouri
Abstract:
The extensive amounts of data required for training deep neural networks pose significant challenges on storage and transmission fronts. Dataset distillation has emerged as a promising technique to condense the information of massive datasets into a much smaller yet representative set of synthetic samples. However, traditional dataset distillation approaches often struggle to scale effectively wit…
▽ More
The extensive amounts of data required for training deep neural networks pose significant challenges on storage and transmission fronts. Dataset distillation has emerged as a promising technique to condense the information of massive datasets into a much smaller yet representative set of synthetic samples. However, traditional dataset distillation approaches often struggle to scale effectively with high-resolution images and more complex architectures due to the limitations in bi-level optimization. Recently, several works have proposed exploiting knowledge distillation with decoupled optimization schemes to scale up dataset distillation. Although these methods effectively address the scalability issue, they rely on extensive image augmentations requiring the storage of soft labels for augmented images. In this paper, we introduce Dataset Distillation using Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models. Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets. By employing these learned text prompts, we can efficiently store and infer new samples for introducing data variability within a fixed memory budget. We show the effectiveness of our method through extensive experiments across various computer vision benchmark datasets with different memory budgets.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Stereographic Spherical Sliced Wasserstein Distances
Authors:
Huy Tran,
Yikun Bai,
Abihith Kothapalli,
Ashkan Shahbazi,
Xinran Liu,
Rocio Diaz Martin,
Soheil Kolouri
Abstract:
Comparing spherical probability distributions is of great interest in various fields, including geology, medical domains, computer vision, and deep representation learning. The utility of optimal transport-based distances, such as the Wasserstein distance, for comparing probability measures has spurred active research in develo** computationally efficient variations of these distances for spheri…
▽ More
Comparing spherical probability distributions is of great interest in various fields, including geology, medical domains, computer vision, and deep representation learning. The utility of optimal transport-based distances, such as the Wasserstein distance, for comparing probability measures has spurred active research in develo** computationally efficient variations of these distances for spherical probability measures. This paper introduces a high-speed and highly parallelizable distance for comparing spherical measures using the stereographic projection and the generalized Radon transform, which we refer to as the Stereographic Spherical Sliced Wasserstein (S3W) distance. We carefully address the distance distortion caused by the stereographic projection and provide an extensive theoretical analysis of our proposed metric and its rotationally invariant variation. Finally, we evaluate the performance of the proposed metrics and compare them with recent baselines in terms of both speed and accuracy through a wide range of numerical studies, including gradient flows and self-supervised learning. Our code is available at https://github.com/mint-vu/s3wd.
△ Less
Submitted 9 June, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
A Strategy for Implementing description Temporal Dynamic Algorithms in Dynamic Knowledge Graphs by SPIN
Authors:
Alireza Shahbazi,
Seyyed Ahmad Mirsanei,
Malikeh Haj Khan Mirzaye Sarraf,
Behrouz Minaei Bidgoli
Abstract:
Planning and reasoning about actions and processes, in addition to reasoning about propositions, are important issues in recent logical and computer science studies. The widespread use of actions in everyday life such as IoT, semantic web services, etc., and the limitations and issues in the action formalisms are two factors that lead us to study how actions are represented.
Since 2007, there ha…
▽ More
Planning and reasoning about actions and processes, in addition to reasoning about propositions, are important issues in recent logical and computer science studies. The widespread use of actions in everyday life such as IoT, semantic web services, etc., and the limitations and issues in the action formalisms are two factors that lead us to study how actions are represented.
Since 2007, there have been some ideas to integrate Description Logic (DL) and action formalisms for representing both static and dynamic knowledge. Meanwhile, time is an important factor in dynamic situations, and actions change states over time. In this study, on the one hand, we examined related logical structures such as extensions of description logics (DLs), temporal formalisms, and action formalisms. On the other hand, we analyzed possible tools for designing and develo** the Knowledge and Action Base (KAB).
For representation and reasoning about actions, we embedded actions into DLs (such as Dynamic-ALC and its extensions). We propose a terminable algorithm for action projection, planning, checking the satisfiability, consistency, realizability, and executability, and also querying from KAB. Actions in this framework were modeled with SPIN and added to state space. This framework has also been implemented as a plugin for the Protégé ontology editor.
During the last two decades, various algorithms have been presented, but due to the high computational complexity, we face many problems in implementing dynamic ontologies. In addition, an algorithm to detect the inconsistency of actions' effects was not explicitly stated. In the proposed strategy, the interactions of actions with other parts of modeled knowledge, and a method to check consistency between the effects of actions are presented. With this framework, the ramification problem can be well handled in future works.
△ Less
Submitted 20 January, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Borhan: A Novel System for Prioritized Default Logic
Authors:
Alireza Shahbazi,
Mohammad Hossein Khojasteh,
Behrouz Minaei-Bidgoli
Abstract:
Prioritized Default Logic presents an optimal solution for addressing real-world problems characterized by incomplete information and the need to establish preferences among diverse scenarios. Although it has reached great success in the theoretical aspect, its practical implementation has received less attention. In this article, we introduce Borhan, a system designed and created for prioritized…
▽ More
Prioritized Default Logic presents an optimal solution for addressing real-world problems characterized by incomplete information and the need to establish preferences among diverse scenarios. Although it has reached great success in the theoretical aspect, its practical implementation has received less attention. In this article, we introduce Borhan, a system designed and created for prioritized default logic reasoning. To create an effective system, we have refined existing default logic definitions, including the extension concept, and introduced novel concepts. In addition to its theoretical merits, Borhan proves its practical utility by efficiently addressing a range of prioritized default logic problems. In addition, one of the advantages of our system is its ability to both store and report the explanation path for any inferred triple, enhancing transparency and interpretability. Borhan is offered as an open-source system, implemented in Python, and even offers a simplified Java version as a plugin for the Protege ontology editor. Borhan thus represents a significant step forward in bridging the gap between the theoretical foundations of default logic and its real-world applications.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Equivariant vs. Invariant Layers: A Comparison of Backbone and Pooling for Point Cloud Classification
Authors:
Ashkan Shahbazi,
Abihith Kothapalli,
Xinran Liu,
Robert Sheng,
Soheil Kolouri
Abstract:
Learning from set-structured data, such as point clouds, has gained significant attention from the community. Geometric deep learning provides a blueprint for designing effective set neural networks by incorporating permutation symmetry. Of our interest are permutation invariant networks, which are composed of a permutation equivariant backbone, permutation invariant global pooling, and regression…
▽ More
Learning from set-structured data, such as point clouds, has gained significant attention from the community. Geometric deep learning provides a blueprint for designing effective set neural networks by incorporating permutation symmetry. Of our interest are permutation invariant networks, which are composed of a permutation equivariant backbone, permutation invariant global pooling, and regression/classification head. While existing literature has focused on improving permutation equivariant backbones, the impact of global pooling is often overlooked. In this paper, we examine the interplay between permutation equivariant backbones and permutation invariant global pooling on three benchmark point cloud classification datasets. Our findings reveal that: 1) complex pooling methods, such as transport-based or attention-based poolings, can significantly boost the performance of simple backbones, but the benefits diminish for more complex backbones, 2) even complex backbones can benefit from pooling layers in low data scenarios, 3) surprisingly, the choice of pooling layers can have a more significant impact on the model's performance than adjusting the width and depth of the backbone, and 4) pairwise combination of pooling layers can significantly improve the performance of a fixed backbone. Our comprehensive study provides insights for practitioners to design better permutation invariant set neural networks.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.