-
RFRL Gym: A Reinforcement Learning Testbed for Cognitive Radio Applications
Authors:
Daniel Rosen,
Illa Rochez,
Caleb McIrvin,
Joshua Lee,
Kevin D'Alessandro,
Max Wiecek,
Nhan Hoang,
Ramzy Saffarini,
Sam Philips,
Vanessa Jones,
Will Ivey,
Zavier Harris-Smart,
Zavion Harris-Smart,
Zayden Chin,
Amos Johnson,
Alyse M. Jones,
William C. Headley
Abstract:
Radio Frequency Reinforcement Learning (RFRL) is anticipated to be a widely applicable technology in the next generation of wireless communication systems, particularly 6G and next-gen military communications. Given this, our research is focused on develo** a tool to promote the development of RFRL techniques that leverage spectrum sensing. In particular, the tool was designed to address two cog…
▽ More
Radio Frequency Reinforcement Learning (RFRL) is anticipated to be a widely applicable technology in the next generation of wireless communication systems, particularly 6G and next-gen military communications. Given this, our research is focused on develo** a tool to promote the development of RFRL techniques that leverage spectrum sensing. In particular, the tool was designed to address two cognitive radio applications, specifically dynamic spectrum access and jamming. In order to train and test reinforcement learning (RL) algorithms for these applications, a simulation environment is necessary to simulate the conditions that an agent will encounter within the Radio Frequency (RF) spectrum. In this paper, such an environment has been developed, herein referred to as the RFRL Gym. Through the RFRL Gym, users can design their own scenarios to model what an RL agent may encounter within the RF spectrum as well as experiment with different spectrum sensing techniques. Additionally, the RFRL Gym is a subclass of OpenAI gym, enabling the use of third-party ML/RL Libraries. We plan to open-source this codebase to enable other researchers to utilize the RFRL Gym to test their own scenarios and RL algorithms, ultimately leading to the advancement of RL research in the wireless communications domain. This paper describes in further detail the components of the Gym, results from example scenarios, and plans for future additions.
Index Terms-machine learning, reinforcement learning, wireless communications, dynamic spectrum access, OpenAI gym
△ Less
Submitted 20 December, 2023;
originally announced January 2024.
-
Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where
Authors:
Zhi-Yi Chin,
Chieh-Ming Jiang,
Ching-Chun Huang,
Pin-Yu Chen,
Wei-Chen Chiu
Abstract:
While image data starts to enjoy the simple-but-effective self-supervised learning scheme built upon masking and self-reconstruction objective thanks to the introduction of tokenization procedure and vision transformer backbone, convolutional neural networks as another important and widely-adopted architecture for image data, though having contrastive-learning techniques to drive the self-supervis…
▽ More
While image data starts to enjoy the simple-but-effective self-supervised learning scheme built upon masking and self-reconstruction objective thanks to the introduction of tokenization procedure and vision transformer backbone, convolutional neural networks as another important and widely-adopted architecture for image data, though having contrastive-learning techniques to drive the self-supervised learning, still face the difficulty of leveraging such straightforward and general masking operation to benefit their learning process significantly. In this work, we aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks as an extra augmentation method. In addition to the additive but unwanted edges (between masked and unmasked regions) as well as other adverse effects caused by the masking operations for ConvNets, which have been discussed by prior works, we particularly identify the potential problem where for one view in a contrastive sample-pair the randomly-sampled masking regions could be overly concentrated on important/salient objects thus resulting in misleading contrastiveness to the other view. To this end, we propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background for realizing the masking-based augmentation. Moreover, we introduce hard negative samples by masking larger regions of salient patches in an input image. Extensive experiments conducted on various datasets, contrastive learning mechanisms, and downstream tasks well verify the efficacy as well as the superior performance of our proposed method with respect to several state-of-the-art baselines.
△ Less
Submitted 8 June, 2024; v1 submitted 22 September, 2023;
originally announced September 2023.
-
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
Authors:
Zhi-Yi Chin,
Chieh-Ming Jiang,
Ching-Chun Huang,
Pin-Yu Chen,
Wei-Chen Chiu
Abstract:
Text-to-image diffusion models, e.g. Stable Diffusion (SD), lately have shown remarkable ability in high-quality content generation, and become one of the representatives for the recent wave of transformative AI. Nevertheless, such advance comes with an intensifying concern about the misuse of this generative technology, especially for producing copyrighted or NSFW (i.e. not safe for work) images.…
▽ More
Text-to-image diffusion models, e.g. Stable Diffusion (SD), lately have shown remarkable ability in high-quality content generation, and become one of the representatives for the recent wave of transformative AI. Nevertheless, such advance comes with an intensifying concern about the misuse of this generative technology, especially for producing copyrighted or NSFW (i.e. not safe for work) images. Although efforts have been made to filter inappropriate images/prompts or remove undesirable concepts/styles via model fine-tuning, the reliability of these safety mechanisms against diversified problematic prompts remains largely unexplored. In this work, we propose Prompting4Debugging (P4D) as a debugging and red-teaming tool that automatically finds problematic prompts for diffusion models to test the reliability of a deployed safety mechanism. We demonstrate the efficacy of our P4D tool in uncovering new vulnerabilities of SD models with safety mechanisms. Particularly, our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered "safe" can actually be manipulated to bypass many deployed safety mechanisms, including concept removal, negative prompt, and safety guidance. Our findings suggest that, without comprehensive testing, the evaluations on limited safe prompting benchmarks can lead to a false sense of safety for text-to-image models.
△ Less
Submitted 8 June, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
Authors:
Sarah J. Zhang,
Samuel Florin,
Ariel N. Lee,
Eamon Niknafs,
Andrei Marginean,
Annie Wang,
Keith Tyser,
Zad Chin,
Yann Hicke,
Nikhil Singh,
Madeleine Udell,
Yoon Kim,
Tonio Buonassisi,
Armando Solar-Lezama,
Iddo Drori
Abstract:
We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that…
▽ More
We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education.
△ Less
Submitted 24 June, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark
Authors:
Vitali Petsiuk,
Alexander E. Siemenn,
Saisamrit Surbehera,
Zad Chin,
Keith Tyser,
Gregory Hunter,
Arvind Raghavan,
Yann Hicke,
Bryan A. Plummer,
Ori Kerret,
Tonio Buonassisi,
Kate Saenko,
Armando Solar-Lezama,
Iddo Drori
Abstract:
We provide a new multi-task benchmark for evaluating text-to-image models. We perform a human evaluation comparing the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models. Twenty computer science AI graduate students evaluated the two models, on three tasks, at three difficulty levels, across ten prompts each, providing 3,600 ratings. Text-to-image generation has seen rapid…
▽ More
We provide a new multi-task benchmark for evaluating text-to-image models. We perform a human evaluation comparing the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models. Twenty computer science AI graduate students evaluated the two models, on three tasks, at three difficulty levels, across ten prompts each, providing 3,600 ratings. Text-to-image generation has seen rapid progress to the point that many recent models have demonstrated their ability to create realistic high-resolution images for various prompts. However, current text-to-image methods and the broader body of research in vision-language understanding still struggle with intricate text prompts that contain many objects with multiple attributes and relationships. We introduce a new text-to-image benchmark that contains a suite of thirty-two tasks over multiple applications that capture a model's ability to handle different features of a text prompt. For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly. Rather than subjectively evaluating text-to-image results on a set of prompts, our new multi-task benchmark consists of challenge tasks at three difficulty levels (easy, medium, and hard) and human ratings for each generated image.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams
Authors:
Iddo Drori,
Sarah J. Zhang,
Reece Shuttleworth,
Sarah Zhang,
Keith Tyser,
Zad Chin,
Pedro Lantigua,
Saisamrit Surbehera,
Gregory Hunter,
Derek Austin,
Leonard Tang,
Yann Hicke,
Sage Simhon,
Sathwik Karnik,
Darnell Granberry,
Madeleine Udell
Abstract:
A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has de…
▽ More
A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has developed program synthesis and few-shot learning methods to solve university-level problem set questions in mathematics and STEM courses. In this work, we develop and compare methods that solve final exams, which differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We curate a dataset and benchmark of questions from machine learning final exams available online and code for answering these questions and generating new questions. We show how to generate new questions from other questions and course notes. For reproducibility and future research on this final exam benchmark, we use automatic checkers for multiple-choice, numeric, and questions with expression answers. We perform ablation studies comparing zero-shot learning with few-shot learning and chain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machine learning topics and find that few-shot learning methods perform best. We highlight the transformative potential of language models to streamline the writing and solution of large-scale assessments, significantly reducing the workload from human days to mere machine seconds. Our results suggest that rather than banning large language models such as ChatGPT in class, instructors should teach students to harness them by asking students meta-questions about correctness, completeness, and originality of the responses generated, encouraging critical thinking in academic studies.
△ Less
Submitted 28 June, 2023; v1 submitted 11 June, 2022;
originally announced June 2022.
-
Explicit Interaction Model towards Text Classification
Authors:
Cunxiao Du,
Zhaozheng Chin,
Fuli Feng,
Lei Zhu,
Tian Gan,
Liqiang Nie
Abstract:
Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the tex…
▽ More
Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multi-label and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.