Skip to main content

Showing 1–7 of 7 results for author: Nitsure, A

.
  1. arXiv:2406.06425  [pdf, other

    stat.ML cs.LG math.ST

    Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

    Authors: Gabriel Rioux, Apoorva Nitsure, Mattia Rigotti, Kristjan Greenewald, Youssef Mroueh

    Abstract: Stochastic dominance is an important concept in probability theory, econometrics and social choice theory for robustly modeling agents' preferences between random outcomes. While many works have been dedicated to the univariate case, little has been done in the multivariate scenario, wherein an agent has to decide between different multivariate outcomes. By exploiting a characterization of multiva… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 27 pages, 2 figures

  2. arXiv:2406.05882  [pdf, other

    cs.LG stat.ML

    Distributional Preference Alignment of LLMs via Optimal Transport

    Authors: Igor Melnyk, Youssef Mroueh, Brian Belgodere, Mattia Rigotti, Apoorva Nitsure, Mikhail Yurochkin, Kristjan Greenewald, Jiri Navratil, Jerret Ross

    Abstract: Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  3. arXiv:2406.04379  [pdf, other

    cs.SE cs.AI cs.CL

    VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation

    Authors: Prashanth Vijayaraghavan, Luyao Shi, Stefano Ambrogio, Charles Mackin, Apoorva Nitsure, David Beymer, Ehsan Degan

    Abstract: With the unprecedented advancements in Large Language Models (LLMs), their application domains have expanded to include code generation tasks across various programming languages. While significant progress has been made in enhancing LLMs for popular programming languages, there exists a notable gap in comprehensive evaluation frameworks tailored for Hardware Description Languages (HDLs), particul… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 Figures, LAD'24

  4. arXiv:2310.07132  [pdf, other

    cs.LG math.ST q-fin.RM stat.ML

    Risk Aware Benchmarking of Large Language Models

    Authors: Apoorva Nitsure, Youssef Mroueh, Mattia Rigotti, Kristjan Greenewald, Brian Belgodere, Mikhail Yurochkin, Jiri Navratil, Igor Melnyk, Jerret Ross

    Abstract: We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and math… ▽ More

    Submitted 9 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  5. arXiv:2304.10819  [pdf, other

    cs.LG cs.AI stat.ML

    Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

    Authors: Brian Belgodere, Pierre Dognin, Adam Ivankay, Igor Melnyk, Youssef Mroueh, Aleksandra Mojsilovic, Jiri Navratil, Apoorva Nitsure, Inkit Padhi, Mattia Rigotti, Jerret Ross, Yair Schiff, Radhika Vedpathak, Richard A. Young

    Abstract: Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framew… ▽ More

    Submitted 9 June, 2024; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: submitted

  6. arXiv:2302.12178  [pdf, other

    cs.AI

    A Scalable Space-efficient In-database Interpretability Framework for Embedding-based Semantic SQL Queries

    Authors: Prabhakar Kudva, Rajesh Bordawekar, Apoorva Nitsure

    Abstract: AI-Powered database (AI-DB) is a novel relational database system that uses a self-supervised neural network, database embedding, to enable semantic SQL queries on relational tables. In this paper, we describe an architecture and implementation of in-database interpretability infrastructure designed to provide simple, transparent, and relatable insights into ranked results of semantic SQL queries… ▽ More

    Submitted 1 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

  7. arXiv:2005.09617   

    cs.DB cs.AI cs.IR

    Unlocking New York City Crime Insights using Relational Database Embeddings

    Authors: Apoorva Nitsure, Rajesh Bordawekar, Jose Neves

    Abstract: This version withdrawn by arXiv administrators because the author did not have the right to agree to our license at the time of submission.

    Submitted 20 May, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: This version withdrawn by arXiv administrators because the author did not have the right to agree to our license at the time of submission