Skip to main content

Showing 1–11 of 11 results for author: Raghuveer, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.04817  [pdf, other

    cs.CL

    FRACTAL: Fine-Grained Scoring from Aggregate Text Labels

    Authors: Yukti Makhija, Priyanka Agrawal, Rishi Saket, Aravindan Raghuveer

    Abstract: Large language models (LLMs) are being increasingly tuned to power complex generation tasks such as writing, fact-seeking, querying and reasoning. Traditionally, human or model feedback for evaluating and further tuning LLM performance has been provided at the response level, enabling faster and more cost-effective assessments. However, recent works (Amplayo et al. [2022], Wu et al. [2023]) indica… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 22 pages, 1 figure

  2. arXiv:2310.10098  [pdf, ps, other

    cs.LG stat.ML

    PAC Learning Linear Thresholds from Label Proportions

    Authors: Anand Brahmbhatt, Rishi Saket, Aravindan Raghuveer

    Abstract: Learning from label proportions (LLP) is a generalization of supervised learning in which the training data is available as sets or bags of feature-vectors (instances) along with the average instance-label of each bag. The goal is to train a good instance classifier. While most previous works on LLP have focused on training models on such training data, computational learnability of LLP was only r… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Spotlight paper at Neural Information Processing Systems (NeurIPS), 2023

  3. arXiv:2310.10096  [pdf, other

    cs.LG stat.ML

    LLP-Bench: A Large Scale Tabular Benchmark for Learning from Label Proportions

    Authors: Anand Brahmbhatt, Mohith Pokala, Rishi Saket, Aravindan Raghuveer

    Abstract: In the task of Learning from Label Proportions (LLP), a model is trained on groups (a.k.a bags) of instances and their corresponding label proportions to predict labels for individual instances. LLP has been applied pre-dominantly on two types of datasets - image and tabular. In image LLP, bags of fixed size are created by randomly sampling instances from an underlying dataset. Bags created via th… ▽ More

    Submitted 5 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  4. arXiv:2310.10092  [pdf, ps, other

    cs.LG stat.ML

    Label Differential Privacy via Aggregation

    Authors: Anand Brahmbhatt, Rishi Saket, Shreyas Havaldar, Anshul Nasery, Aravindan Raghuveer

    Abstract: In many real-world applications, due to recent developments in the privacy landscape, training data may be aggregated to preserve the privacy of sensitive training labels. In the learning from label proportions (LLP) framework, the dataset is partitioned into bags of feature-vectors which are available only with the sum of the labels per bag. A further restriction, which we call learning from bag… ▽ More

    Submitted 27 November, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

  5. arXiv:2310.08056  [pdf, other

    cs.LG cs.AI

    Learning from Label Proportions: Bootstrap** Supervised Learners via Belief Propagation

    Authors: Shreyas Havaldar, Navodita Sharma, Shubhi Sareen, Karthikeyan Shanmugam, Aravindan Raghuveer

    Abstract: Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that… ▽ More

    Submitted 20 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Published as a conference paper at The Twelfth International Conference on Learning Representations (ICLR 2024) & Oral Presentation at Regulatable ML @ NeurIPS 2023

  6. arXiv:2310.07535  [pdf, other

    cs.LG cs.AI

    Fairness under Covariate Shift: Improving Fairness-Accuracy tradeoff with few Unlabeled Test Samples

    Authors: Shreyas Havaldar, Jatin Chauhan, Karthikeyan Shanmugam, Jay Nandy, Aravindan Raghuveer

    Abstract: Covariate shift in the test data is a common practical phenomena that can significantly downgrade both the accuracy and the fairness performance of the model. Ensuring fairness across different sensitive groups under covariate shift is of paramount importance due to societal implications like criminal justice. We operate in the unsupervised regime where only a small set of unlabeled test samples a… ▽ More

    Submitted 8 January, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted at The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

  7. arXiv:2307.03322  [pdf, other

    cs.CL

    BiPhone: Modeling Inter Language Phonetic Influences in Text

    Authors: Abhirut Gupta, Ananya B. Sai, Richard Sproat, Yuri Vasilevski, James S. Ren, Ambarish Jash, Sukhdeep S. Sodhi, Aravindan Raghuveer

    Abstract: A large number of people are forced to use the Web in a language they have low literacy in due to technology asymmetries. Written text in the second language (L2) from such users often contains a large number of errors that are influenced by their native language (L1). We propose a method to mine phoneme confusions (sounds in L2 that an L1 speaker is likely to conflate) for pairs of L1 and L2. The… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: Accepted at ACL 2023

  8. arXiv:2305.11826  [pdf, other

    cs.CL cs.AI

    ReTAG: Reasoning Aware Table to Analytic Text Generation

    Authors: Deepanway Ghosal, Preksha Nema, Aravindan Raghuveer

    Abstract: The task of table summarization involves generating text that both succinctly and accurately represents the table or a specific set of highlighted cells within a table. While significant progress has been made in table to text generation techniques, models still mostly generate descriptive summaries, which reiterates the information contained within the table in sentences. Through analysis of popu… ▽ More

    Submitted 29 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  9. arXiv:2212.01667  [pdf, other

    cs.CL

    T-STAR: Truthful Style Transfer using AMR Graph as Intermediate Representation

    Authors: Anubhav Jangra, Preksha Nema, Aravindan Raghuveer

    Abstract: Unavailability of parallel corpora for training text style transfer (TST) models is a very challenging yet common scenario. Also, TST models implicitly need to preserve the content while transforming a source sentence into the target style. To tackle these problems, an intermediate representation is often constructed that is devoid of style while still preserving the meaning of the source sentence… ▽ More

    Submitted 3 December, 2022; originally announced December 2022.

    Comments: Accepted in EMNLP 2022

  10. Multi-Variate Time Series Forecasting on Variable Subsets

    Authors: Jatin Chauhan, Aravindan Raghuveer, Rishi Saket, Jay Nandy, Balaraman Ravindran

    Abstract: We formulate a new inference task in the domain of multivariate time series forecasting (MTSF), called Variable Subset Forecast (VSF), where only a small subset of the variables is available during inference. Variables are absent during inference because of long-term data loss (eg. sensor failures) or high -> low-resource domain shift between train / test. To the best of our knowledge, robustness… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Journal ref: ACM SIGKDD 2022 Research Track

  11. arXiv:2109.04443  [pdf, other

    cs.CL cs.AI cs.LG

    HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints

    Authors: Sahana Ramnath, Melvin Johnson, Abhirut Gupta, Aravindan Raghuveer

    Abstract: Back-translation (BT) of target monolingual corpora is a widely used data augmentation strategy for neural machine translation (NMT), especially for low-resource language pairs. To improve effectiveness of the available BT data, we introduce HintedBT -- a family of techniques which provides hints (through tags) to the encoder and decoder. First, we propose a novel method of using both high and low… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: 17 pages including references and appendix. Accepted at EMNLP 2021