Skip to main content

Showing 1–11 of 11 results for author: Balashankar, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17104  [pdf, other

    cs.CL

    Automated Adversarial Discovery for Safety Classifiers

    Authors: Yash Kumar Lal, Preethi Lahoti, Aradhana Sinha, Yao Qin, Ananth Balashankar

    Abstract: Safety classifiers are critical in mitigating toxicity on online forums such as social media and in chatbots. Still, they continue to be vulnerable to emergent, and often innumerable, adversarial attacks. Traditional automated adversarial data generation methods, however, tend to produce attacks that are not diverse, but variations of previously observed harm types. We formalize the task of automa… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Published at Fourth Workshop on TrustworthyNLP (TrustNLP) at NAACL 2024

  2. arXiv:2406.16738  [pdf, other

    cs.LG cs.AI cs.CY

    Inducing Group Fairness in LLM-Based Decisions

    Authors: James Atwood, Preethi Lahoti, Ananth Balashankar, Flavien Prost, Ahmad Beirami

    Abstract: Prompting Large Language Models (LLMs) has created new and interesting means for classifying textual data. While evaluating and remediating group fairness is a well-studied problem in classifier fairness literature, some classical approaches (e.g., regularization) do not carry over, and some new opportunities arise (e.g., prompt-based remediation). We measure fairness of LLM-based classifiers on a… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2404.12318  [pdf, other

    cs.CL

    Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

    Authors: Zhaofeng Wu, Ananth Balashankar, Yoon Kim, Jacob Eisenstein, Ahmad Beirami

    Abstract: Aligning language models (LMs) based on human-annotated preference data is a crucial step in obtaining practical and performant LM-based systems. However, multilingual human preference data are difficult to obtain at scale, making it challenging to extend this framework to diverse languages. In this work, we evaluate a simple approach for zero-shot cross-lingual alignment, where a reward model is… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  4. arXiv:2310.16959  [pdf, other

    cs.LG

    Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning

    Authors: Ananth Balashankar, Xiao Ma, Aradhana Sinha, Ahmad Beirami, Yao Qin, Jilin Chen, Alex Beutel

    Abstract: As large language models (LLMs) are widely adopted, new safety issues and policies emerge, to which existing safety classifiers do not generalize well. If we have only observed a few examples of violations of a new safety rule, how can we build a classifier to detect violations? In this paper, we study the novel setting of domain-generalized few-shot learning for LLM-based text safety classifiers.… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  5. arXiv:2310.16955  [pdf, other

    cs.LG

    Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks

    Authors: Aradhana Sinha, Ananth Balashankar, Ahmad Beirami, Thi Avrahami, Jilin Chen, Alex Beutel

    Abstract: Real-world natural language processing systems need to be robust to human adversaries. Collecting examples of human adversaries for training is an effective but expensive solution. On the other hand, training on synthetic attacks with small perturbations - such as word-substitution - does not actually improve robustness to human adversaries. In this paper, we propose an adversarial training framew… ▽ More

    Submitted 14 February, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Journal ref: Transactions on Machine Learning Research (2024)

  6. arXiv:2305.13535  [pdf, other

    cs.CL cs.LG

    Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals

    Authors: Ananth Balashankar, Xuezhi Wang, Yao Qin, Ben Packer, Nithum Thain, Jilin Chen, Ed H. Chi, Alex Beutel

    Abstract: Counterfactual Data Augmentation (CDA) is a commonly used technique for improving robustness in natural language classifiers. However, one fundamental challenge is how to discover meaningful counterfactuals and efficiently label them, with minimal human labeling cost. Most existing methods either completely rely on human-annotated labels, an expensive process which limits the scale of counterfactu… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  7. arXiv:2302.01381  [pdf, other

    cs.LG cs.CV

    Effective Robustness against Natural Distribution Shifts for Models with Different Training Data

    Authors: Zhouxing Shi, Nicholas Carlini, Ananth Balashankar, Ludwig Schmidt, Cho-Jui Hsieh, Alex Beutel, Yao Qin

    Abstract: "Effective robustness" measures the extra out-of-distribution (OOD) robustness beyond what can be predicted from the in-distribution (ID) performance. Existing effective robustness evaluations typically use a single test set such as ImageNet to evaluate the ID accuracy. This becomes problematic when evaluating models trained on different data distributions, e.g., comparing models trained on ImageN… ▽ More

    Submitted 28 October, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  8. arXiv:2111.15602  [pdf, other

    cs.CL cs.LG

    Fine-grained prediction of food insecurity using news streams

    Authors: Ananth Balashankar, Lakshminarayanan Subramanian, Samuel P. Fraiberger

    Abstract: Anticipating the outbreak of a food crisis is crucial to efficiently allocate emergency relief and reduce human suffering. However, existing food insecurity early warning systems rely on risk measures that are often delayed, outdated, or incomplete. Here, we leverage recent advances in deep learning to extract high-frequency precursors to food crises from the text of a large corpus of news article… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  9. arXiv:2010.00678  [pdf, other

    cs.CL cs.CY

    Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling

    Authors: Yan Shvartzshnaider, Ananth Balashankar, Vikas Patidar, Thomas Wies, Lakshminarayanan Subramanian

    Abstract: This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity, an established social theory framework for reasoning about privacy norms. Privacy policies, written by lawyers, are lengthy and often comprise incomplete and vague statements. In this paper, we show that traditional NLP tasks, including the recently proposed Question-A… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: 11 pages, 4 figures

  10. arXiv:1910.14120  [pdf, other

    cs.LG stat.ML

    What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers

    Authors: Ananth Balashankar, Alyssa Lees, Chris Welty, Lakshminarayanan Subramanian

    Abstract: The potential for learned models to amplify existing societal biases has been broadly recognized. Fairness-aware classifier constraints, which apply equality metrics of performance across subgroups defined on sensitive attributes such as race and gender, seek to rectify inequity but can yield non-uniform degradation in performance for skewed datasets. In certain domains, imbalanced degradation of… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

  11. arXiv:1910.11452  [pdf, ps, other

    cs.LG cs.CY stat.ML

    Fairness Sample Complexity and the Case for Human Intervention

    Authors: Ananth Balashankar, Alyssa Lees

    Abstract: With the aim of building machine learning systems that incorporate standards of fairness and accountability, we explore explicit subgroup sample complexity bounds. The work is motivated by the observation that classifier predictions for real world datasets often demonstrate drastically different metrics, such as accuracy, when subdivided by specific sensitive variable subgroups. The reasons for th… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: Where is the Human? Bridging the Gap Between AI and HCI, CHI Workshop 2019