Search | arXiv e-print repository

Fair Active Ranking from Pairwise Preferences

Abstract: We investigate the problem of probably approximately correct and fair (PACF) ranking of items by adaptively evoking pairwise comparisons. Given a set of $n$ items that belong to disjoint groups, our goal is to find an $(ε, δ)$-PACF-Ranking according to a fair objective function that we propose. We assume access to an oracle, wherein, for each query, the learner can choose a pair of items and recei… ▽ More We investigate the problem of probably approximately correct and fair (PACF) ranking of items by adaptively evoking pairwise comparisons. Given a set of $n$ items that belong to disjoint groups, our goal is to find an $(ε, δ)$-PACF-Ranking according to a fair objective function that we propose. We assume access to an oracle, wherein, for each query, the learner can choose a pair of items and receive stochastic winner feedback from the oracle. Our proposed objective function asks to minimize the $\ell_q$ norm of the error of the groups, where the error of a group is the $\ell_p$ norm of the error of all the items within that group, for $p, q \geq 1$. This generalizes the objective function of $ε$-Best-Ranking, proposed by Saha & Gopalan (2019). By adopting our objective function, we gain the flexibility to explore fundamental fairness concepts like equal or proportionate errors within a unified framework. Adjusting parameters $p$ and $q$ allows tailoring to specific fairness preferences. We present both group-blind and group-aware algorithms and analyze their sample complexity. We provide matching lower bounds up to certain logarithmic factors for group-blind algorithms. For a restricted class of group-aware algorithms, we show that we can get reasonable lower bounds. We conduct comprehensive experiments on both real-world and synthetic datasets to complement our theoretical findings. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 39 pages, 3.1 MB

arXiv:2311.10800 [pdf, other]

doi 10.1145/3640537.3641580

The Next 700 ML-Enabled Compiler Optimizations

Authors: S. VenkataKeerthy, Siddharth Jain, Umesh Kalvakuntla, Pranav Sai Gorantla, Rajiv Shailesh Chitale, Eugene Brevdo, Albert Cohen, Mircea Trofin, Ramakrishna Upadrasta

Abstract: There is a growing interest in enhancing compiler optimizations with ML models, yet interactions between compilers and ML frameworks remain challenging. Some optimizations require tightly coupled models and compiler internals,raising issues with modularity, performance and framework independence. Practical deployment and transparency for the end-user are also important concerns. We propose ML-Comp… ▽ More There is a growing interest in enhancing compiler optimizations with ML models, yet interactions between compilers and ML frameworks remain challenging. Some optimizations require tightly coupled models and compiler internals,raising issues with modularity, performance and framework independence. Practical deployment and transparency for the end-user are also important concerns. We propose ML-Compiler-Bridge to enable ML model development within a traditional Python framework while making end-to-end integration with an optimizing compiler possible and efficient. We evaluate it on both research and production use cases, for training and inference, over several optimization problems, multiple compilers and its versions, and gym infrastructures. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2308.13242 [pdf, other]

Optimizing Group-Fair Plackett-Luce Ranking Models for Relevance and Ex-Post Fairness

Authors: Sruthi Gorantla, Eshaan Bhansali, Amit Deshpande, Anand Louis

Abstract: In learning-to-rank (LTR), optimizing only the relevance (or the expected ranking utility) can cause representational harm to certain categories of items. Moreover, if there is implicit bias in the relevance scores, LTR models may fail to optimize for true relevance. Previous works have proposed efficient algorithms to train stochastic ranking models that achieve fairness of exposure to the groups… ▽ More In learning-to-rank (LTR), optimizing only the relevance (or the expected ranking utility) can cause representational harm to certain categories of items. Moreover, if there is implicit bias in the relevance scores, LTR models may fail to optimize for true relevance. Previous works have proposed efficient algorithms to train stochastic ranking models that achieve fairness of exposure to the groups ex-ante (or, in expectation), which may not guarantee representation fairness to the groups ex-post, that is, after realizing a ranking from the stochastic ranking model. Typically, ex-post fairness is achieved by post-processing, but previous work does not train stochastic ranking models that are aware of this post-processing. In this paper, we propose a novel objective that maximizes expected relevance only over those rankings that satisfy given representation constraints to ensure ex-post fairness. Building upon recent work on an efficient sampler for ex-post group-fair rankings, we propose a group-fair Plackett-Luce model and show that it can be efficiently optimized for our objective in the LTR framework. Experiments on three real-world datasets show that our group-fair algorithm guarantees fairness alongside usually having better relevance compared to the LTR baselines. In addition, our algorithm also achieves better relevance than post-processing baselines, which also ensures ex-post fairness. Further, when implicit bias is injected into the training data, our algorithm typically outperforms existing LTR baselines in relevance. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 20 pages

arXiv:2306.11964 [pdf, other]

Sampling Individually-Fair Rankings that are Always Group Fair

Authors: Sruthi Gorantla, Anay Mehrotra, Amit Deshpande, Anand Louis

Abstract: Rankings on online platforms help their end-users find the relevant information -- people, news, media, and products -- quickly. Fair ranking tasks, which ask to rank a set of items to maximize utility subject to satisfying group-fairness constraints, have gained significant interest in the Algorithmic Fairness, Information Retrieval, and Machine Learning literature. Recent works, however, identif… ▽ More Rankings on online platforms help their end-users find the relevant information -- people, news, media, and products -- quickly. Fair ranking tasks, which ask to rank a set of items to maximize utility subject to satisfying group-fairness constraints, have gained significant interest in the Algorithmic Fairness, Information Retrieval, and Machine Learning literature. Recent works, however, identify uncertainty in the utilities of items as a primary cause of unfairness and propose introducing randomness in the output. This randomness is carefully chosen to guarantee an adequate representation of each item (while accounting for the uncertainty). However, due to this randomness, the output rankings may violate group fairness constraints. We give an efficient algorithm that samples rankings from an individually-fair distribution while ensuring that every output ranking is group fair. The expected utility of the output ranking is at least $α$ times the utility of the optimal fair solution. Here, $α$ depends on the utilities, position-discounts, and constraints -- it approaches 1 as the range of utilities or the position-discounts shrinks, or when utilities satisfy distributional assumptions. Empirically, we observe that our algorithm achieves individual and group fairness and that Pareto dominates the state-of-the-art baselines. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: Full version of a paper accepted for presentation in ACM AIES 2023

arXiv:2208.10095 [pdf, other]

Socially Fair Center-based and Linear Subspace Clustering

Authors: Sruthi Gorantla, Kishen N. Gowda, Amit Deshpande, Anand Louis

Abstract: Center-based clustering (e.g., $k$-means, $k$-medians) and clustering using linear subspaces are two most popular techniques to partition real-world data into smaller clusters. However, when the data consists of sensitive demographic groups, significantly different clustering cost per point for different sensitive groups can lead to fairness-related harms (e.g., different quality-of-service). The… ▽ More Center-based clustering (e.g., $k$-means, $k$-medians) and clustering using linear subspaces are two most popular techniques to partition real-world data into smaller clusters. However, when the data consists of sensitive demographic groups, significantly different clustering cost per point for different sensitive groups can lead to fairness-related harms (e.g., different quality-of-service). The goal of socially fair clustering is to minimize the maximum cost of clustering per point over all groups. In this work, we propose a unified framework to solve socially fair center-based clustering and linear subspace clustering, and give practical, efficient approximation algorithms for these problems. We do extensive experiments to show that on multiple benchmark datasets our algorithms either closely match or outperform state-of-the-art baselines. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: 17 pages

arXiv:2203.00887 [pdf, other]

Sampling Ex-Post Group-Fair Rankings

Authors: Sruthi Gorantla, Amit Deshpande, Anand Louis

Abstract: Randomized rankings have been of recent interest to achieve ex-ante fairer exposure and better robustness than deterministic rankings. We propose a set of natural axioms for randomized group-fair rankings and prove that there exists a unique distribution $D$ that satisfies our axioms and is supported only over ex-post group-fair rankings, i.e., rankings that satisfy given lower and upper bounds on… ▽ More Randomized rankings have been of recent interest to achieve ex-ante fairer exposure and better robustness than deterministic rankings. We propose a set of natural axioms for randomized group-fair rankings and prove that there exists a unique distribution $D$ that satisfies our axioms and is supported only over ex-post group-fair rankings, i.e., rankings that satisfy given lower and upper bounds on group-wise representation in the top-$k$ ranks. Our problem formulation works even when there is implicit bias, incomplete relevance information, or only ordinal ranking is available instead of relevance scores or utility values. We propose two algorithms to sample a random group-fair ranking from the distribution $D$ mentioned above. Our first dynamic programming-based algorithm samples ex-post group-fair rankings uniformly at random in time $O(k^2\ell)$, where $\ell$ is the number of groups. Our second random walk-based algorithm samples ex-post group-fair rankings from a distribution $δ$-close to $D$ in total variation distance and has expected running time $O^*(k^2\ell^2)$, when there is a sufficient gap between the given upper and lower bounds on the group-wise representation. The former does exact sampling, but the latter runs significantly faster on real-world data sets for larger values of $k$. We give empirical evidence that our algorithms compare favorably against recent baselines for fairness and ranking utility on real-world data sets. △ Less

Submitted 29 May, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

Comments: 31 pages. Accepted for publication as a full paper in IJCAI 2023

arXiv:2010.06986 [pdf, other]

On the Problem of Underranking in Group-Fair Ranking

Authors: Sruthi Gorantla, Amit Deshpande, Anand Louis

Abstract: Search and recommendation systems, such as search engines, recruiting tools, online marketplaces, news, and social media, output ranked lists of content, products, and sometimes, people. Credit ratings, standardized tests, risk assessments output only a score, but are also used implicitly for ranking. Bias in such ranking systems, especially among the top ranks, can worsen social and economic ineq… ▽ More Search and recommendation systems, such as search engines, recruiting tools, online marketplaces, news, and social media, output ranked lists of content, products, and sometimes, people. Credit ratings, standardized tests, risk assessments output only a score, but are also used implicitly for ranking. Bias in such ranking systems, especially among the top ranks, can worsen social and economic inequalities, polarize opinions, and reinforce stereotypes. On the other hand, a bias correction for minority groups can cause more harm if perceived as favoring group-fair outcomes over meritocracy. In this paper, we formulate the problem of underranking in group-fair rankings, which was not addressed in previous work. Most group-fair ranking algorithms post-process a given ranking and output a group-fair ranking. We define underranking based on how close the group-fair rank of each item is to its original rank, and prove a lower bound on the trade-off achievable for simultaneous underranking and group fairness in ranking. We give a fair ranking algorithm that takes any given ranking and outputs another ranking with simultaneous underranking and group fairness guarantees comparable to the lower bound we prove. Our algorithm works with group fairness constraints for any number of groups. Our experimental results confirm the theoretical trade-off between underranking and group fairness, and also show that our algorithm achieves the best of both when compared to the state-of-the-art baselines. △ Less

Submitted 18 February, 2021; v1 submitted 24 September, 2020; originally announced October 2020.

Comments: 27 pages

arXiv:1902.08342 [pdf, other]

Aspect-Sentiment Embeddings for Company Profiling and Employee Opinion Mining

Authors: Rajiv Bajpai, Devamanyu Hazarika, Kunal Singh, Sruthi Gorantla, Erik Cambria, Roger Zimmerman

Abstract: With the multitude of companies and organizations abound today, ranking them and choosing one out of the many is a difficult and cumbersome task. Although there are many available metrics that rank companies, there is an inherent need for a generalized metric that takes into account the different aspects that constitute employee opinions of the companies. In this work, we aim to overcome the afore… ▽ More With the multitude of companies and organizations abound today, ranking them and choosing one out of the many is a difficult and cumbersome task. Although there are many available metrics that rank companies, there is an inherent need for a generalized metric that takes into account the different aspects that constitute employee opinions of the companies. In this work, we aim to overcome the aforementioned problem by generating aspect-sentiment based embedding for the companies by looking into reliable employee reviews of them. We created a comprehensive dataset of company reviews from the famous website Glassdoor.com and employed a novel ensemble approach to perform aspect-level sentiment analysis. Although a relevant amount of work has been done on reviews centered on subjects like movies, music, etc., this work is the first of its kind. We also provide several insights from the collated embeddings, thus hel** users gain a better understanding of their options as well as select companies using customized preferences. △ Less

Submitted 21 February, 2019; originally announced February 2019.

arXiv:1901.02523 [pdf, other]

Construction and Analysis of Posterior Matching in Arbitrary Dimensions via Optimal Transport

Authors: Diego A. Mesa, Rui Ma, Siva K. Gorantla, Todd P. Coleman

Abstract: The posterior matching scheme, for feedback encoding of a message point lying on the unit interval over memoryless channels, maximizes mutual information for an arbitrary number of channel uses. However, it in general does not always achieve any positive rate; so far, elaborate analyses have been required to show that it achieves any positive rate below capacity. More recent efforts have introduce… ▽ More The posterior matching scheme, for feedback encoding of a message point lying on the unit interval over memoryless channels, maximizes mutual information for an arbitrary number of channel uses. However, it in general does not always achieve any positive rate; so far, elaborate analyses have been required to show that it achieves any positive rate below capacity. More recent efforts have introduced a random "dither" shared by the encoder and decoder to the problem formulation, to simplify analyses and guarantee that the randomized scheme achieves any rate below capacity. Motivated by applications (e.g. human-computer interfaces) where (a) common randomness shared by the encoder and decoder may not be feasible and (b) the message point lies in a higher dimensional space, we focus here on the original formulation without common randomness, and use optimal transport theory to generalize the scheme for a message point in a higher dimensional space. By defining a stricter, almost sure, notion of message decoding, we use classical probabilistic techniques (e.g. change of measure and martingale convergence) to establish succinct necessary and sufficient conditions on when the message point can be recovered from infinite observations: Birkhoff ergodicity of a random process sequentially generated by the encoder. We also show a surprising "all or nothing" result: the same ergodicity condition is necessary and sufficient to achieve any rate below capacity. We provide applications of this message point framework in human-computer interfaces and multi-antenna communications. △ Less

Submitted 8 January, 2019; originally announced January 2019.

Comments: Submitted to the IEEE Transactions on Information Theory

arXiv:1805.06413 [pdf, other]

CASCADE: Contextual Sarcasm Detection in Online Discussion Forums

Authors: Devamanyu Hazarika, Soujanya Poria, Sruthi Gorantla, Erik Cambria, Roger Zimmermann, Rada Mihalcea

Abstract: The literature in automated sarcasm detection has mainly focused on lexical, syntactic and semantic-level analysis of text. However, a sarcastic sentence can be expressed with contextual presumptions, background and commonsense knowledge. In this paper, we propose CASCADE (a ContextuAl SarCasm DEtector) that adopts a hybrid approach of both content and context-driven modeling for sarcasm detection… ▽ More The literature in automated sarcasm detection has mainly focused on lexical, syntactic and semantic-level analysis of text. However, a sarcastic sentence can be expressed with contextual presumptions, background and commonsense knowledge. In this paper, we propose CASCADE (a ContextuAl SarCasm DEtector) that adopts a hybrid approach of both content and context-driven modeling for sarcasm detection in online social media discussions. For the latter, CASCADE aims at extracting contextual information from the discourse of a discussion thread. Also, since the sarcastic nature and form of expression can vary from person to person, CASCADE utilizes user embeddings that encode stylometric and personality features of the users. When used along with content-based feature extractors such as Convolutional Neural Networks (CNNs), we see a significant boost in the classification performance on a large Reddit corpus. △ Less

Submitted 16 May, 2018; originally announced May 2018.

Comments: Accepted in COLING 2018

arXiv:1102.0250 [pdf, ps, other]

Information-Theoretic Viewpoints on Optimal Causal Coding-Decoding Problems

Authors: Siva Gorantla, Todd Coleman

Abstract: In this paper we consider an interacting two-agent sequential decision-making problem consisting of a Markov source process, a causal encoder with feedback, and a causal decoder. Motivated by a desire to foster links between control and information theory, we augment the standard formulation by considering general alphabets and a cost function operating on current and previous symbols. Using dynam… ▽ More In this paper we consider an interacting two-agent sequential decision-making problem consisting of a Markov source process, a causal encoder with feedback, and a causal decoder. Motivated by a desire to foster links between control and information theory, we augment the standard formulation by considering general alphabets and a cost function operating on current and previous symbols. Using dynamic programming, we provide a structural result whereby an optimal scheme exists that operates on appropriate sufficient statistics. We emphasize an example where the decoder alphabet lies in a space of beliefs on the source alphabet, and the additive cost function is a log likelihood ratio pertaining to sequential information gain. We also consider the inverse optimal control problem, where a fixed encoder/decoder pair satisfying statistical conditions is shown to be optimal for some cost function, using probabilistic matching. We provide examples of the applicability of this framework to communication with feedback, hidden Markov models and the nonlinear filter, decentralized control, brain-machine interfaces, and queuing theory. △ Less

Submitted 1 February, 2011; originally announced February 2011.

Comments: submitted to IEEE Transactions on Information Theory

arXiv:1101.1934 [pdf, ps, other]

doi 10.1109/TIT.2012.2227671

doi 10.1109/ISIT.2010.5513239

Bit-wise Unequal Error Protection for Variable Length Block Codes with Feedback

Authors: Baris Nakiboglu, Siva K. Gorantla, Lizhong Zheng, Todd P. Coleman

Abstract: The bit-wise unequal error protection problem, for the case when the number of groups of bits $\ell$ is fixed, is considered for variable length block codes with feedback. An encoding scheme based on fixed length block codes with erasures is used to establish inner bounds to the achievable performance for finite expected decoding time. A new technique for bounding the performance of variable lengt… ▽ More The bit-wise unequal error protection problem, for the case when the number of groups of bits $\ell$ is fixed, is considered for variable length block codes with feedback. An encoding scheme based on fixed length block codes with erasures is used to establish inner bounds to the achievable performance for finite expected decoding time. A new technique for bounding the performance of variable length block codes is used to establish outer bounds to the performance for a given expected decoding time. The inner and the outer bounds match one another asymptotically and characterize the achievable region of rate-exponent vectors, completely. The single message message-wise unequal error protection problem for variable length block codes with feedback is also solved as a necessary step on the way. △ Less

Submitted 17 December, 2012; v1 submitted 10 January, 2011; originally announced January 2011.

Comments: 41 pages, 3 figures

Journal ref: IEEE Transactions on Information Theory, 59(3):1475-1504, March 2013

Showing 1–12 of 12 results for author: Gorantla, S