Skip to main content

Showing 1–24 of 24 results for author: Moraffah, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19417  [pdf, other

    cs.CR cs.AI

    "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models

    Authors: Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Song Wang, Jundong Li, Tianlong Chen, Huan Liu

    Abstract: Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) by integrating external knowledge bases, improving their performance in applications like fact-checking and information searching. In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases by injecting deceptive content into the retrieval database, intentionall… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2405.04793  [pdf, other

    cs.CL cs.AI cs.LG

    Zero-shot LLM-guided Counterfactual Generation for Text

    Authors: Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu

    Abstract: Counterfactual examples are frequently used for model development and evaluation in many natural language processing (NLP) tasks. Although methods for automated counterfactual generation have been explored, such methods depend on models such as pre-trained language models that are then fine-tuned on auxiliary, often task-specific datasets. Collecting and annotating such datasets for counterfactual… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2309.13340

  3. arXiv:2404.11036  [pdf, other

    cs.LG cs.CL

    Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement

    Authors: Paras Sheth, Tharindu Kumarage, Raha Moraffah, Aman Chadha, Huan Liu

    Abstract: Content moderation faces a challenging task as social media's ability to spread hate speech contrasts with its role in promoting global connectivity. With rapidly evolving slang and hate speech, the adaptability of conventional deep learning to the fluid landscape of online dialogue remains limited. In response, causality inspired disentanglement has shown promise by segregating platform specific… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  4. arXiv:2403.15690  [pdf, other

    cs.CL cs.AI cs.LG

    EAGLE: A Domain Generalization Framework for AI-generated Text Detection

    Authors: Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu

    Abstract: With the advancement in capabilities of Large Language Models (LLMs), one major step in the responsible and safe use of such LLMs is to be able to detect text generated by these models. While supervised AI-generated text detectors perform well on text generated by older LLMs, with the frequent release of new LLMs, building supervised detectors for identifying text from such new models would requir… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  5. arXiv:2403.01152  [pdf, other

    cs.CL cs.AI

    A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization

    Authors: Tharindu Kumarage, Garima Agrawal, Paras Sheth, Raha Moraffah, Aman Chadha, Joshua Garland, Huan Liu

    Abstract: We have witnessed lately a rapid proliferation of advanced Large Language Models (LLMs) capable of generating high-quality text. While these LLMs have revolutionized text generation across various domains, they also pose significant risks to the information ecosystem, such as the potential for generating convincing propaganda, misinformation, and disinformation at scale. This paper offers a review… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  6. arXiv:2402.14859  [pdf, other

    cs.CR cs.AI cs.CY cs.LG

    The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative

    Authors: Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Yu Kong, Tianlong Chen, Huan Liu

    Abstract: Due to their unprecedented ability to process and respond to various types of data, Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI). As these advanced generative models increasingly form collaborative networks for complex tasks, the integrity and security of these systems are crucial. Our paper, ``The Wolf Within'', explore… ▽ More

    Submitted 2 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to workshop on ReGenAI@CVPR 2024

  7. arXiv:2402.06655  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Adversarial Text Purification: A Large Language Model Approach for Defense

    Authors: Raha Moraffah, Shubh Khandelwal, Amrita Bhattacharjee, Huan Liu

    Abstract: Adversarial purification is a defense mechanism for safeguarding classifiers against adversarial attacks without knowing the type of attacks or training of the classifier. These techniques characterize and eliminate adversarial perturbations from the attacked inputs, aiming to restore purified samples that retain similarity to the initially attacked ones and are correctly classified by the classif… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: PAKDD 2024

  8. arXiv:2402.02732  [pdf, other

    cs.LG cs.AI cs.CR

    A Generative Approach to Surrogate-based Black-box Attacks

    Authors: Raha Moraffah, Huan Liu

    Abstract: Surrogate-based black-box attacks have exposed the heightened vulnerability of DNNs. These attacks are designed to craft adversarial examples for any samples with black-box target feedback for only a given set of samples. State-of-the-art surrogate-based attacks involve training a discriminative surrogate that mimics the target's outputs. The goal is to learn the decision boundaries of the target.… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  9. arXiv:2402.02696  [pdf, other

    cs.LG cs.AI

    Causal Feature Selection for Responsible Machine Learning

    Authors: Raha Moraffah, Paras Sheth, Saketh Vishnubhatla, Huan Liu

    Abstract: Machine Learning (ML) has become an integral aspect of many real-world applications. As a result, the need for responsible machine learning has emerged, focusing on aligning ML models to ethical and social values, while enhancing their reliability and trustworthiness. Responsible ML involves many issues. This survey addresses four main issues: interpretability, fairness, adversarial robustness, an… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  10. arXiv:2402.02695  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Exploiting Class Probabilities for Black-box Sentence-level Attacks

    Authors: Raha Moraffah, Huan Liu

    Abstract: Sentence-level attacks craft adversarial sentences that are synonymous with correctly-classified sentences but are misclassified by the text classifiers. Under the black-box setting, classifiers are only accessible through their feedback to queried inputs, which is predominately available in the form of class probabilities. Even though utilizing class probabilities results in stronger attacks, due… ▽ More

    Submitted 20 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: EACL 2024 Findings

  11. arXiv:2311.00807  [pdf, other

    cs.CV cs.LG

    VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization

    Authors: Suraj Jyothi Unni, Raha Moraffah, Huan Liu

    Abstract: Visual question answering (VQA) models are designed to demonstrate visual-textual reasoning capabilities. However, their real-world applicability is hindered by a lack of comprehensive benchmark datasets. Existing domain generalization datasets for VQA exhibit a unilateral focus on textual shifts while VQA being a multi-modal task contains shifts across both visual and textual domains. We propose… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  12. arXiv:2310.05095  [pdf, other

    cs.CL cs.AI

    How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts

    Authors: Tharindu Kumarage, Paras Sheth, Raha Moraffah, Joshua Garland, Huan Liu

    Abstract: In recent years, there has been a rapid proliferation of AI-generated text, primarily driven by the release of powerful pre-trained language models (PLMs). To address the issue of misuse associated with AI-generated text, various high-performing detectors have been developed, including the OpenAI detector and the Stanford DetectGPT. In our study, we ask how reliable these detectors are. We answer… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (Findings)

  13. arXiv:2309.13340  [pdf, other

    cs.CL cs.AI cs.LG

    Towards LLM-guided Causal Explainability for Black-box Text Classifiers

    Authors: Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu

    Abstract: With the advent of larger and more complex deep learning models, such as in Natural Language Processing (NLP), model qualities like explainability and interpretability, albeit highly desirable, are becoming harder challenges to tackle and solve. For example, state-of-the-art models in text classification are black-box by design. Although standard explanation methods provide some degree of explaina… ▽ More

    Submitted 29 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Camera-ready for AAAI ReLM 2024

  14. arXiv:2309.03992  [pdf, other

    cs.CL cs.AI cs.LG

    ConDA: Contrastive Domain Adaptation for AI-generated Text Detection

    Authors: Amrita Bhattacharjee, Tharindu Kumarage, Raha Moraffah, Huan Liu

    Abstract: Large language models (LLMs) are increasingly being used for generating text in a variety of use cases, including journalistic news articles. Given the potential malicious nature in which these LLMs can be used to generate disinformation at scale, it is important to build effective detectors for such AI-generated text. Given the surge in development of new LLMs, acquiring labeled training data for… ▽ More

    Submitted 20 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Camera-ready for IJCNLP-AACL 2023 main track

  15. arXiv:2308.02080  [pdf, other

    cs.CL cs.LG

    Causality Guided Disentanglement for Cross-Platform Hate Speech Detection

    Authors: Paras Sheth, Tharindu Kumarage, Raha Moraffah, Aman Chadha, Huan Liu

    Abstract: Social media platforms, despite their value in promoting open discourse, are often exploited to spread harmful content. Current deep learning and natural language processing models used for detecting this harmful content overly rely on domain-specific terms affecting their capabilities to adapt to generalizable hate speech detection. This is because they tend to focus too narrowly on particular li… ▽ More

    Submitted 10 December, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: Accepted to WSDM'24

  16. arXiv:2306.08804  [pdf, other

    cs.CL cs.LG

    PEACE: Cross-Platform Hate Speech Detection- A Causality-guided Framework

    Authors: Paras Sheth, Tharindu Kumarage, Raha Moraffah, Aman Chadha, Huan Liu

    Abstract: Hate speech detection refers to the task of detecting hateful content that aims at denigrating an individual or a group based on their religion, gender, sexual orientation, or other characteristics. Due to the different policies of the platforms, different groups of people express hate in different ways. Furthermore, due to the lack of labeled data in some platforms it becomes challenging to build… ▽ More

    Submitted 8 October, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: ECML PKDD 2023

  17. arXiv:2209.15177  [pdf, other

    cs.LG

    Domain Generalization -- A Causal Perspective

    Authors: Paras Sheth, Raha Moraffah, K. Selçuk Candan, Adrienne Raglin, Huan Liu

    Abstract: Machine learning models rely on various assumptions to attain high accuracy. One of the preliminary assumptions of these models is the independent and identical distribution, which suggests that the train and test data are sampled from the same distribution. However, this assumption seldom holds in the real world due to distribution shifts. As a result models that rely on this assumption exhibit p… ▽ More

    Submitted 6 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

  18. arXiv:2202.02896  [pdf, other

    cs.LG cs.AI stat.ME

    Evaluation Methods and Measures for Causal Learning Algorithms

    Authors: Lu Cheng, Ruocheng Guo, Raha Moraffah, Paras Sheth, K. Selcuk Candan, Huan Liu

    Abstract: The convenient access to copious multi-faceted data has encouraged machine learning researchers to reconsider correlation-based learning and embrace the opportunity of causality-based learning, i.e., causal machine learning (causal learning). Recent years have therefore witnessed great effort in develo** causal learning algorithms aiming to help AI achieve human-level intelligence. Due to the la… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

    Comments: 21 pages. Accepted to IEEE TAI

  19. arXiv:2102.05829  [pdf, other

    cs.LG stat.ML

    Causal Inference for Time series Analysis: Problems, Methods and Evaluation

    Authors: Raha Moraffah, Paras Sheth, Mansooreh Karami, Anchit Bhattacharya, Qianru Wang, Anique Tahir, Adrienne Raglin, Huan Liu

    Abstract: Time series data is a collection of chronological observations which is generated by several domains such as medical and financial fields. Over the years, different tasks such as classification, forecasting, and clustering have been proposed to analyze this type of data. Time series data has been also used to study the effect of interventions over time. Moreover, in many fields of science, learnin… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

  20. arXiv:2012.09785  [pdf, ps, other

    cs.LG eess.SP stat.ML

    Use of Bayesian Nonparametric methods for Estimating the Measurements in High Clutter

    Authors: Bahman Moraffah, Christ Richmond, Raha Moraffah, Antonia Papandreou-Suppappola

    Abstract: Robust tracking of a target in a clutter environment is an important and challenging task. In recent years, the nearest neighbor methods and probabilistic data association filters were proposed. However, the performance of these methods diminishes as the number of measurements increases. In this paper, we propose a robust generative approach to effectively model multiple sensor measurements for tr… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

  21. arXiv:2008.11376  [pdf, other

    cs.LG stat.ML

    Causal Adversarial Network for Learning Conditional and Interventional Distributions

    Authors: Raha Moraffah, Bahman Moraffah, Mansooreh Karami, Adrienne Raglin, Huan Liu

    Abstract: We propose a generative Causal Adversarial Network (CAN) for learning and sampling from conditional and interventional distributions. In contrast to the existing CausalGAN which requires the causal graph to be given, our proposed framework learns the causal relations from the data and generates samples accordingly. The proposed CAN comprises a two-fold process namely Label Generation Network (LGN)… ▽ More

    Submitted 21 September, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

  22. arXiv:2003.03934  [pdf, ps, other

    cs.LG stat.ML

    Causal Interpretability for Machine Learning -- Problems, Methods and Evaluation

    Authors: Raha Moraffah, Mansooreh Karami, Ruocheng Guo, Adrienne Raglin, Huan Liu

    Abstract: Machine learning models have had discernible achievements in a myriad of applications. However, most of these models are black-boxes, and it is obscure how the decisions are made by them. This makes the models unreliable and untrustworthy. To provide insights into the decision making processes of these models, a variety of traditional interpretable models have been proposed. Moreover, to generate… ▽ More

    Submitted 19 March, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

  23. arXiv:1910.12417  [pdf, other

    cs.LG stat.ML

    Deep causal representation learning for unsupervised domain adaptation

    Authors: Raha Moraffah, Kai Shu, Adrienne Raglin, Huan Liu

    Abstract: Studies show that the representations learned by deep neural networks can be transferred to similar prediction tasks in other domains for which we do not have enough labeled data. However, as we transition to higher layers in the model, the representations become more task-specific and less generalizable. Recent research on deep domain adaptation proposed to mitigate this problem by forcing the de… ▽ More

    Submitted 27 October, 2019; originally announced October 2019.

  24. arXiv:1808.03333  [pdf, other

    cs.LG stat.ML

    Linked Causal Variational Autoencoder for Inferring Paired Spillover Effects

    Authors: Vineeth Rakesh, Ruocheng Guo, Raha Moraffah, Nitin Agarwal, Huan Liu

    Abstract: Modeling spillover effects from observational data is an important problem in economics, business, and other fields of research. % It helps us infer the causality between two seemingly unrelated set of events. For example, if consumer spending in the United States declines, it has spillover effects on economies that depend on the U.S. as their largest export market. In this paper, we aim to infer… ▽ More

    Submitted 3 October, 2018; v1 submitted 9 August, 2018; originally announced August 2018.

    Comments: First two authors contributed equally, 4 pages, CIKM'18