Search | arXiv e-print repository

The Penrose limit of the Weyl double copy

Authors: Samarth Chawla, Kwinten Fransen, Cynthia Keeler

Abstract: We embed the Penrose limit into the Weyl classical double copy. Thereby, we provide a lift of the double copy properties of plane wave spacetimes into black hole geometries and we open a novel avenue towards taking the classical double copy beyond statements about algebraically special backgrounds. In particular, the Penrose limit, viewed as the leading order Fermi coordinate expansion around a nu… ▽ More We embed the Penrose limit into the Weyl classical double copy. Thereby, we provide a lift of the double copy properties of plane wave spacetimes into black hole geometries and we open a novel avenue towards taking the classical double copy beyond statements about algebraically special backgrounds. In particular, the Penrose limit, viewed as the leading order Fermi coordinate expansion around a null geodesic, complements approaches leveraging asymptotic flatness such as the asymptotic Weyl double copy. Along the way, we show how our embedding of the Penrose limit within the Weyl double copy naturally fixes the functional ambiguity in the double copy for Petrov type N spacetimes. We also highlight the utility of a spinorial approach to the Penrose limit. In particular, we use this spinorial approach to derive a simple analytical expression for arbitrary Penrose limits of four-dimensional, vacuum type D spacetimes. △ Less

Submitted 10 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: 44+18 pages

arXiv:2405.17130 [pdf, other]

Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training

Authors: Enes Altinisik, Safa Messaoud, Husrev Taha Sencar, Hassan Sajjad, Sanjay Chawla

Abstract: Despite being a heavily researched topic, Adversarial Training (AT) is rarely, if ever, deployed in practical AI systems for two primary reasons: (i) the gained robustness is frequently accompanied by a drop in generalization and (ii) generating adversarial examples (AEs) is computationally prohibitively expensive. To address these limitations, we propose SMAAT, a new AT algorithm that leverages t… ▽ More Despite being a heavily researched topic, Adversarial Training (AT) is rarely, if ever, deployed in practical AI systems for two primary reasons: (i) the gained robustness is frequently accompanied by a drop in generalization and (ii) generating adversarial examples (AEs) is computationally prohibitively expensive. To address these limitations, we propose SMAAT, a new AT algorithm that leverages the manifold conjecture, stating that off-manifold AEs lead to better robustness while on-manifold AEs result in better generalization. Specifically, SMAAT aims at generating a higher proportion of off-manifold AEs by perturbing the intermediate deepnet layer with the lowest intrinsic dimension. This systematically results in better scalability compared to classical AT as it reduces the PGD chains length required for generating the AEs. Additionally, our study provides, to the best of our knowledge, the first explanation for the difference in the generalization and robustness trends between vision and language models, ie., AT results in a drop in generalization in vision models whereas, in encoder-based language models, generalization either improves or remains unchanged. We show that vision transformers and decoder-based models tend to have low intrinsic dimensionality in the earlier layers of the network (more off-manifold AEs), while encoder-based models have low intrinsic dimensionality in the later layers. We demonstrate the efficacy of SMAAT; on several tasks, including robustifying (i) sentiment classifiers, (ii) safety filters in decoder-based models, and (iii) retrievers in RAG setups. SMAAT requires only 25-33% of the GPU time compared to standard AT, while significantly improving robustness across all applications and maintaining comparable generalization. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.00987 [pdf, other]

S$^2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic

Authors: Safa Messaoud, Billel Mokeddem, Zhenghai Xue, Linsey Pang, Bo An, Haipeng Chen, Sanjay Chawla

Abstract: Learning expressive stochastic policies instead of deterministic ones has been proposed to achieve better stability, sample complexity, and robustness. Notably, in Maximum Entropy Reinforcement Learning (MaxEnt RL), the policy is modeled as an expressive Energy-Based Model (EBM) over the Q-values. However, this formulation requires the estimation of the entropy of such EBMs, which is an open probl… ▽ More Learning expressive stochastic policies instead of deterministic ones has been proposed to achieve better stability, sample complexity, and robustness. Notably, in Maximum Entropy Reinforcement Learning (MaxEnt RL), the policy is modeled as an expressive Energy-Based Model (EBM) over the Q-values. However, this formulation requires the estimation of the entropy of such EBMs, which is an open problem. To address this, previous MaxEnt RL methods either implicitly estimate the entropy, resulting in high computational complexity and variance (SQL), or follow a variational inference procedure that fits simplified actor distributions (e.g., Gaussian) for tractability (SAC). We propose Stein Soft Actor-Critic (S$^2$AC), a MaxEnt RL algorithm that learns expressive policies without compromising efficiency. Specifically, S$^2$AC uses parameterized Stein Variational Gradient Descent (SVGD) as the underlying policy. We derive a closed-form expression of the entropy of such policies. Our formula is computationally efficient and only depends on first-order derivatives and vector products. Empirical results show that S$^2$AC yields more optimal solutions to the MaxEnt objective than SQL and SAC in the multi-goal environment, and outperforms SAC and SQL on the MuJoCo benchmark. Our code is available at: https://github.com/SafaMessaoud/S2AC-Energy-Based-RL-with-Stein-Soft-Actor-Critic △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Accepted for publication at ICLR 2024

arXiv:2404.14679 [pdf, ps, other]

A Multi-Dimensional Online Contention Resolution Scheme for Revenue Maximization

Authors: Shuchi Chawla, Dimitris Christou, Trung Dang, Zhiyi Huang, Gregory Kehne, Ro** Rezvan

Abstract: We study multi-buyer multi-item sequential item pricing mechanisms for revenue maximization with the goal of approximating a natural fractional relaxation -- the ex ante optimal revenue. We assume that buyers' values are subadditive but make no assumptions on the value distributions. While the optimal revenue, and therefore also the ex ante benchmark, is inapproximable by any simple mechanism in t… ▽ More We study multi-buyer multi-item sequential item pricing mechanisms for revenue maximization with the goal of approximating a natural fractional relaxation -- the ex ante optimal revenue. We assume that buyers' values are subadditive but make no assumptions on the value distributions. While the optimal revenue, and therefore also the ex ante benchmark, is inapproximable by any simple mechanism in this context, previous work has shown that a weaker benchmark that optimizes over so-called ``buy-many" mechanisms can be approximable. Approximations are known, in particular, for settings with either a single buyer or many unit-demand buyers. We extend these results to the much broader setting of many subadditive buyers. We show that the ex ante buy-many revenue can be approximated via sequential item pricings to within an $O(\log^2 m)$ factor, where $m$ is the number of items. We also show that a logarithmic dependence on $m$ is necessary. Our approximation is achieved through the construction of a new multi-dimensional Online Contention Resolution Scheme (OCRS), that provides an online rounding of the optimal ex ante solution. Chawla et al. arXiv:2204.01962 previously constructed an OCRS for revenue for unit-demand buyers, but their construction relied heavily on the ``almost single dimensional" nature of unit-demand values. Prior to that work, OCRSes have only been studied in the context of social welfare maximization for single-parameter buyers. For the welfare objective, constant-factor approximations have been demonstrated for a wide range of combinatorial constraints on item allocations and classes of buyer valuation functions. Our work opens up the possibility of a similar success story for revenue maximization. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 39 pages

arXiv:2404.05219 [pdf, other]

Out-of-Distribution Data: An Acquaintance of Adversarial Examples -- A Survey

Authors: Naveen Karunanayake, Ravin Gunawardena, Suranga Seneviratne, Sanjay Chawla

Abstract: Deep neural networks (DNNs) deployed in real-world applications can encounter out-of-distribution (OOD) data and adversarial examples. These represent distinct forms of distributional shifts that can significantly impact DNNs' reliability and robustness. Traditionally, research has addressed OOD detection and adversarial robustness as separate challenges. This survey focuses on the intersection of… ▽ More Deep neural networks (DNNs) deployed in real-world applications can encounter out-of-distribution (OOD) data and adversarial examples. These represent distinct forms of distributional shifts that can significantly impact DNNs' reliability and robustness. Traditionally, research has addressed OOD detection and adversarial robustness as separate challenges. This survey focuses on the intersection of these two areas, examining how the research community has investigated them together. Consequently, we identify two key research directions: robust OOD detection and unified robustness. Robust OOD detection aims to differentiate between in-distribution (ID) data and OOD data, even when they are adversarially manipulated to deceive the OOD detector. Unified robustness seeks a single approach to make DNNs robust against both adversarial attacks and OOD inputs. Accordingly, first, we establish a taxonomy based on the concept of distributional shifts. This framework clarifies how robust OOD detection and unified robustness relate to other research areas addressing distributional shifts, such as OOD detection, open set recognition, and anomaly detection. Subsequently, we review existing work on robust OOD detection and unified robustness. Finally, we highlight the limitations of the existing work and propose promising research directions that explore adversarial and OOD inputs within a unified framework. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2402.08789 [pdf, other]

Leveraging cough sounds to optimize chest x-ray usage in low-resource settings

Authors: Alexander Philip, Sanya Chawla, Lola Jover, George P. Kafentzis, Joe Brew, Vishakh Saraf, Shibu Vijayan, Peter Small, Carlos Chaccour

Abstract: Chest X-ray is a commonly used tool during triage, diagnosis and management of respiratory diseases. In resource-constricted settings, optimizing this resource can lead to valuable cost savings for the health care system and the patients as well as to and improvement in consult time. We used prospectively-collected data from 137 patients referred for chest X-ray at the Christian Medical Center and… ▽ More Chest X-ray is a commonly used tool during triage, diagnosis and management of respiratory diseases. In resource-constricted settings, optimizing this resource can lead to valuable cost savings for the health care system and the patients as well as to and improvement in consult time. We used prospectively-collected data from 137 patients referred for chest X-ray at the Christian Medical Center and Hospital (CMCH) in Purnia, Bihar, India. Each patient provided at least five coughs while awaiting radiography. Collected cough sounds were analyzed using acoustic AI methods. Cross-validation was done on temporal and spectral features on the cough sounds of each patient. Features were summarized using standard statistical approaches. Three models were developed, tested and compared in their capacity to predict an abnormal result in the chest X-ray. All three methods yielded models that could discriminate to some extent between normal and abnormal with the logistic regression performing best with an area under the receiver operating characteristic curves ranging from 0.7 to 0.78. Despite limitations and its relatively small sample size, this study shows that AI-enabled algorithms can use cough sounds to predict which individuals presenting for chest radiographic examination will have a normal or abnormal results. These results call for expanding this research given the potential optimization of limited health care resources in low- and middle-income countries. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.07483 [pdf, other]

T-RAG: Lessons from the LLM Trenches

Authors: Masoomali Fatehkia, Ji Kim Lucas, Sanjay Chawla

Abstract: Large Language Models (LLM) have shown remarkable language capabilities fueling attempts to integrate them into applications across a wide range of domains. An important application area is question answering over private enterprise documents where the main considerations are data security, which necessitates applications that can be deployed on-prem, limited computational resources and the need f… ▽ More Large Language Models (LLM) have shown remarkable language capabilities fueling attempts to integrate them into applications across a wide range of domains. An important application area is question answering over private enterprise documents where the main considerations are data security, which necessitates applications that can be deployed on-prem, limited computational resources and the need for a robust application that correctly responds to queries. Retrieval-Augmented Generation (RAG) has emerged as the most prominent framework for building LLM-based applications. While building a RAG is relatively straightforward, making it robust and a reliable application requires extensive customization and relatively deep knowledge of the application domain. We share our experiences building and deploying an LLM application for question answering over private organizational documents. Our application combines the use of RAG with a finetuned open-source LLM. Additionally, our system, which we call Tree-RAG (T-RAG), uses a tree structure to represent entity hierarchies within the organization. This is used to generate a textual description to augment the context when responding to user queries pertaining to entities within the organization's hierarchy. Our evaluations, including a Needle in a Haystack test, show that this combination performs better than a simple RAG or finetuning implementation. Finally, we share some lessons learned based on our experiences building an LLM application for real-world use. △ Less

Submitted 6 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: Added Needle in a Haystack analysis for T-RAG

arXiv:2311.14754 [pdf, other]

ExCeL : Combined Extreme and Collective Logit Information for Enhancing Out-of-Distribution Detection

Authors: Naveen Karunanayake, Suranga Seneviratne, Sanjay Chawla

Abstract: Deep learning models often exhibit overconfidence in predicting out-of-distribution (OOD) data, underscoring the crucial role of OOD detection in ensuring reliability in predictions. Among various OOD detection approaches, post-hoc detectors have gained significant popularity, primarily due to their ease of use and implementation. However, the effectiveness of most post-hoc OOD detectors has been… ▽ More Deep learning models often exhibit overconfidence in predicting out-of-distribution (OOD) data, underscoring the crucial role of OOD detection in ensuring reliability in predictions. Among various OOD detection approaches, post-hoc detectors have gained significant popularity, primarily due to their ease of use and implementation. However, the effectiveness of most post-hoc OOD detectors has been constrained as they rely solely either on extreme information, such as the maximum logit, or on the collective information (i.e., information spanned across classes or training samples) embedded within the output layer. In this paper, we propose ExCeL that combines both extreme and collective information within the output layer for enhanced accuracy in OOD detection. We leverage the logit of the top predicted class as the extreme information (i.e., the maximum logit), while the collective information is derived in a novel approach that involves assessing the likelihood of other classes appearing in subsequent ranks across various training samples. Our idea is motivated by the observation that, for in-distribution (ID) data, the ranking of classes beyond the predicted class is more deterministic compared to that in OOD data. Experiments conducted on CIFAR100 and ImageNet-200 datasets demonstrate that ExCeL consistently is among the five top-performing methods out of twenty-one existing post-hoc baselines when the joint performance on near-OOD and far-OOD is considered (i.e., in terms of AUROC and FPR95). Furthermore, ExCeL shows the best overall performance across both datasets, unlike other baselines that work best on one dataset but has a performance drop in the other. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2309.15254 [pdf, ps, other]

doi 10.1002/we.2917

Modeling the effect of wind speed and direction shear on utility-scale wind turbine power production

Authors: Storm A. Mata, Juan José Pena MartÍnez, Jesús Bas Quesada, Felipe Palou Larrañaga, Neeraj Yadav, Jasvipul S. Chawla, Varun Sivaram, Michael F. Howland

Abstract: Wind speed and direction variations across the rotor affect power production. As utility-scale turbines extend higher into the atmospheric boundary layer (ABL) with larger rotor diameters and hub heights, they increasingly encounter more complex wind speed and direction variations. We assess three models for power production that account for wind speed and direction shear. Two are based on actuato… ▽ More Wind speed and direction variations across the rotor affect power production. As utility-scale turbines extend higher into the atmospheric boundary layer (ABL) with larger rotor diameters and hub heights, they increasingly encounter more complex wind speed and direction variations. We assess three models for power production that account for wind speed and direction shear. Two are based on actuator disc representations and the third is a blade element representation. We also evaluate the predictions from a standard power curve model that has no knowledge of wind shear. The predictions from each model, driven by wind profile measurements from a profiling LiDAR, are compared to concurrent power measurements from an adjacent utility-scale wind turbine. In the field measurements of the utility-scale turbine, discrete combinations of speed and direction shear induce changes in power production of -19% to +34% relative to the turbine power curve for a given hub height wind speed. Positive speed shear generally corresponds to over-performance and positive direction shear to under-performance, relative to the power curve. Overall, the blade element model produces both higher correlation and lower error relative to the other models, but its quantitative accuracy depends on induction and controller sub-models. To further assess the influence of complex, non-monotonic wind profiles, we also drive the models with best-fit power law wind speed profiles and linear wind direction profiles. These idealized inputs produce qualitative and quantitative differences in power predictions from each model, demonstrating that time-varying, non-monotonic wind shear affects wind power production. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 28 pages, 14 figures in main body, 1 figure in Appendix A, to be published in Wiley Wind Energy

Journal ref: Wind Energy (2024) 1-27

arXiv:2307.05717 [pdf, other]

Towards Mobility Data Science (Vision Paper)

Authors: Mohamed Mokbel, Mahmoud Sakr, Li Xiong, Andreas Züfle, Jussara Almeida, Taylor Anderson, Walid Aref, Gennady Andrienko, Natalia Andrienko, Yang Cao, Sanjay Chawla, Reynold Cheng, Panos Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, Dimitrios Gunopulos, Christian Jensen, Joon-Seok Kim, Kyoung-Sook Kim, Peer Kröger, John Krumm, Johannes Lauer, Amr Magdy, Mario Nascimento , et al. (23 additional authors not shown)

Abstract: Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences… ▽ More Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years. △ Less

Submitted 7 March, 2024; v1 submitted 21 June, 2023; originally announced July 2023.

Comments: Updated to reflect the major revision for ACM Transactions on Spatial Algorithms and Systems (TSAS). This version reflects the final version accepted by ACM TSAS

arXiv:2306.11604 [pdf, ps, other]

Composition of nested embeddings with an application to outlier removal

Authors: Shuchi Chawla, Kristin Sheridan

Abstract: We study the design of embeddings into Euclidean space with outliers. Given a metric space $(X,d)$ and an integer $k$, the goal is to embed all but $k$ points in $X$ (called the ``outliers") into $\ell_2$ with the smallest possible distortion $c$. Finding the optimal distortion $c$ for a given outlier set size $k$, or alternately the smallest $k$ for a given target distortion $c$ are both NP-hard… ▽ More We study the design of embeddings into Euclidean space with outliers. Given a metric space $(X,d)$ and an integer $k$, the goal is to embed all but $k$ points in $X$ (called the ``outliers") into $\ell_2$ with the smallest possible distortion $c$. Finding the optimal distortion $c$ for a given outlier set size $k$, or alternately the smallest $k$ for a given target distortion $c$ are both NP-hard problems. In fact, it is UGC-hard to approximate $k$ to within a factor smaller than $2$ even when the metric sans outliers is isometrically embeddable into $\ell_2$. We consider bi-criteria approximations. Our main result is a polynomial time algorithm that approximates the outlier set size to within an $O(\log^2 k)$ factor and the distortion to within a constant factor. The main technical component in our result is an approach for constructing Lipschitz extensions of embeddings into Banach spaces (such as $\ell_p$ spaces). We consider a stronger version of Lipschitz extension that we call a \textit{nested composition of embeddings}: given a low distortion embedding of a subset $S$ of the metric space $X$, our goal is to extend this embedding to all of $X$ such that the distortion over $S$ is preserved, whereas the distortion over the remaining pairs of points in $X$ is bounded by a function of the size of $X\setminus S$. Prior work on Lipschitz extension considers settings where the size of $X$ is potentially much larger than that of $S$ and the expansion bounds depend on $|S|$. In our setting, the set $S$ is nearly all of $X$ and the remaining set $X\setminus S$, a.k.a. the outliers, is small. We achieve an expansion bound that is logarithmic in $|X\setminus S|$. △ Less

Submitted 6 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: 28 pages (including 2 appendices), 5 figures

arXiv:2306.02417 [pdf, other]

Black Hole Horizons from the Double Copy

Authors: Samarth Chawla, Cynthia Keeler

Abstract: We describe a procedure for locating black hole horizons in Kerr-Schild spacetimes in the double copy paradigm. Using only single- and zeroth-copy data on flat spacetime, our procedure predicts the existence of trapped surfaces in the double-copy gravitational solution. We show explicitly how this procedure locates the horizon of the Schwarzschild black hole and the general Myers-Perry black hole. We describe a procedure for locating black hole horizons in Kerr-Schild spacetimes in the double copy paradigm. Using only single- and zeroth-copy data on flat spacetime, our procedure predicts the existence of trapped surfaces in the double-copy gravitational solution. We show explicitly how this procedure locates the horizon of the Schwarzschild black hole and the general Myers-Perry black hole. △ Less

Submitted 4 June, 2023; originally announced June 2023.

Comments: 25 pages, 2 figures

arXiv:2304.01958 [pdf, other]

Online Time-Windows TSP with Predictions

Authors: Shuchi Chawla, Dimitris Christou

Abstract: In the Time-Windows TSP (TW-TSP) we are given requests at different locations on a network; each request is endowed with a reward and an interval of time; the goal is to find a tour that visits as much reward as possible during the corresponding time window. For the online version of this problem, where each request is revealed at the start of its time window, no finite competitive ratio can be ob… ▽ More In the Time-Windows TSP (TW-TSP) we are given requests at different locations on a network; each request is endowed with a reward and an interval of time; the goal is to find a tour that visits as much reward as possible during the corresponding time window. For the online version of this problem, where each request is revealed at the start of its time window, no finite competitive ratio can be obtained. We consider a version of the problem where the algorithm is presented with predictions of where and when the online requests will appear, without any knowledge of the quality of this side information. Vehicle routing problems such as the TW-TSP can be very sensitive to errors or changes in the input due to the hard time-window constraints, and it is unclear whether imperfect predictions can be used to obtain a finite competitive ratio. We show that good performance can be achieved by explicitly building slack into the solution. Our main result is an online algorithm that achieves a competitive ratio logarithmic in the diameter of the underlying network, matching the performance of the best offline algorithm to within factors that depend on the quality of the provided predictions. The competitive ratio degrades smoothly as a function of the quality and we show that this dependence is tight within constant factors. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: 31 pages, 1 figure

arXiv:2211.16316 [pdf, other]

A3T: Accuracy Aware Adversarial Training

Authors: Enes Altinisik, Safa Messaoud, Husrev Taha Sencar, Sanjay Chawla

Abstract: Adversarial training has been empirically shown to be more prone to overfitting than standard training. The exact underlying reasons still need to be fully understood. In this paper, we identify one cause of overfitting related to current practices of generating adversarial samples from misclassified samples. To address this, we propose an alternative approach that leverages the misclassified samp… ▽ More Adversarial training has been empirically shown to be more prone to overfitting than standard training. The exact underlying reasons still need to be fully understood. In this paper, we identify one cause of overfitting related to current practices of generating adversarial samples from misclassified samples. To address this, we propose an alternative approach that leverages the misclassified samples to mitigate the overfitting problem. We show that our approach achieves better generalization while having comparable robustness to state-of-the-art adversarial training methods on a wide range of computer vision, natural language processing, and tabular tasks. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.10873 [pdf, other]

Interpretable Scientific Discovery with Symbolic Regression: A Review

Authors: Nour Makke, Sanjay Chawla

Abstract: Symbolic regression is emerging as a promising machine learning method for learning succinct underlying interpretable mathematical expressions directly from data. Whereas it has been traditionally tackled with genetic programming, it has recently gained a growing interest in deep learning as a data-driven model discovery method, achieving significant advances in various application domains ranging… ▽ More Symbolic regression is emerging as a promising machine learning method for learning succinct underlying interpretable mathematical expressions directly from data. Whereas it has been traditionally tackled with genetic programming, it has recently gained a growing interest in deep learning as a data-driven model discovery method, achieving significant advances in various application domains ranging from fundamental to applied sciences. This survey presents a structured and comprehensive overview of symbolic regression methods and discusses their strengths and limitations. △ Less

Submitted 2 May, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

arXiv:2211.05523 [pdf, other]

Impact of Adversarial Training on Robustness and Generalizability of Language Models

Authors: Enes Altinisik, Hassan Sajjad, Husrev Taha Sencar, Safa Messaoud, Sanjay Chawla

Abstract: Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the e… ▽ More Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the effect of pre-training data augmentation as well as training time input perturbations vs. embedding space perturbations on the robustness and generalization of transformer-based language models. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. However, training with embedding space perturbation significantly improves generalization. A linguistic correlation analysis of neurons of the learned models reveals that the improved generalization is due to 'more specialized' neurons. To the best of our knowledge, this is the first work to carry out a deep qualitative analysis of different methods of generating adversarial examples in adversarial training of language models. △ Less

Submitted 10 December, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

arXiv:2210.01797 [pdf, other]

Ten Years after ImageNet: A 360° Perspective on AI

Authors: Sanjay Chawla, Preslav Nakov, Ahmed Ali, Wendy Hall, Issa Khalil, Xiaosong Ma, Husrev Taha Sencar, Ingmar Weber, Michael Wooldridge, Ting Yu

Abstract: It is ten years since neural networks made their spectacular comeback. Prompted by this anniversary, we take a holistic perspective on Artificial Intelligence (AI). Supervised Learning for cognitive tasks is effectively solved - provided we have enough high-quality labeled data. However, deep neural network models are not easily interpretable, and thus the debate between blackbox and whitebox mode… ▽ More It is ten years since neural networks made their spectacular comeback. Prompted by this anniversary, we take a holistic perspective on Artificial Intelligence (AI). Supervised Learning for cognitive tasks is effectively solved - provided we have enough high-quality labeled data. However, deep neural network models are not easily interpretable, and thus the debate between blackbox and whitebox modeling has come to the fore. The rise of attention networks, self-supervised learning, generative modeling, and graph neural networks has widened the application space of AI. Deep Learning has also propelled the return of reinforcement learning as a core building block of autonomous decision making systems. The possible harms made possible by new AI technologies have raised socio-technical issues such as transparency, fairness, and accountability. The dominance of AI by Big-Tech who control talent, computing resources, and most importantly, data may lead to an extreme AI divide. Failure to meet high expectations in high profile, and much heralded flagship projects like self-driving vehicles could trigger another AI winter. △ Less

Submitted 30 September, 2022; originally announced October 2022.

arXiv:2209.09275 [pdf, other]

doi 10.1007/JHEP04(2023)005

Aligned Fields Double Copy to Kerr-NUT-(A)dS

Authors: Samarth Chawla, Cynthia Keeler

Abstract: We find Abelian gauge fields that double copy to a large class of black hole spacetimes with spherical horizon topology known as the Kerr-NUT-(A)dS family. Using a multi-Kerr-Schild prescription, we extend the previously-known double copy structure for arbitrarily rotating general dimension black holes, to include NUT charges and an arbitrary cosmological constant. In all cases, these single copy… ▽ More We find Abelian gauge fields that double copy to a large class of black hole spacetimes with spherical horizon topology known as the Kerr-NUT-(A)dS family. Using a multi-Kerr-Schild prescription, we extend the previously-known double copy structure for arbitrarily rotating general dimension black holes, to include NUT charges and an arbitrary cosmological constant. In all cases, these single copy gauge fields are 'aligned fields', because their nonzero components align with the principal tensor which generates the Killing structure of the spacetime. In five dimensions, we additionally derive the same single-copy field strengths via the Weyl double copy procedure. △ Less

Submitted 18 October, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

Comments: 28 pages. v2: Added missing reference and fixed bibliography formatting

Journal ref: J. High Energ. Phys. 2023, 5 (2023)

arXiv:2204.04136 [pdf, ps, other]

Individually-Fair Auctions for Multi-Slot Sponsored Search

Authors: Shuchi Chawla, Ro** Rezvan, Nathaniel Sauerberg

Abstract: We design fair sponsored search auctions that achieve a near-optimal tradeoff between fairness and quality. Our work builds upon the model and auction design of Chawla and Jagadeesan \cite{CJ22}, who considered the special case of a single slot. We consider sponsored search settings with multiple slots and the standard model of click through rates that are multiplicatively separable into an advert… ▽ More We design fair sponsored search auctions that achieve a near-optimal tradeoff between fairness and quality. Our work builds upon the model and auction design of Chawla and Jagadeesan \cite{CJ22}, who considered the special case of a single slot. We consider sponsored search settings with multiple slots and the standard model of click through rates that are multiplicatively separable into an advertiser-specific component and a slot-specific component. When similar users have similar advertiser-specific click through rates, our auctions achieve the same near-optimal tradeoff between fairness and quality as in \cite{CJ22}. When similar users can have different advertiser-specific preferences, we show that a preference-based fairness guarantee holds. Finally, we provide a computationally efficient algorithm for computing payments for our auctions as well as those in previous work, resolving another open direction from \cite{CJ22}. △ Less

Submitted 8 April, 2022; originally announced April 2022.

arXiv:2204.01962 [pdf, ps, other]

Buy-Many Mechanisms for Many Unit-Demand Buyers

Authors: Shuchi Chawla, Ro** Rezvan, Yifeng Teng, Christos Tzamos

Abstract: A recent line of research has established a novel desideratum for designing approximately-revenue-optimal multi-item mechanisms, namely the buy-many constraint. Under this constraint, prices for different allocations made by the mechanism must be subadditive, implying that the price of a bundle cannot exceed the sum of prices of individual items it contains. This natural constraint has enabled sev… ▽ More A recent line of research has established a novel desideratum for designing approximately-revenue-optimal multi-item mechanisms, namely the buy-many constraint. Under this constraint, prices for different allocations made by the mechanism must be subadditive, implying that the price of a bundle cannot exceed the sum of prices of individual items it contains. This natural constraint has enabled several positive results in multi-item mechanism design bypassing well-established impossibility results. Our work addresses the main open question from this literature of extending the buy-many constraint to multiple buyer settings and develo** an approximation. We propose a new revenue benchmark for multi-buyer mechanisms via an ex-ante relaxation that captures several different ways of extending the buy-many constraint to the multi-buyer setting. Our main result is that a simple sequential item pricing mechanism with buyer-specific prices can achieve an $O(\log m)$ approximation to this revenue benchmark when all buyers have unit-demand or additive preferences over m items. This is the best possible as it directly matches the previous results for the single-buyer setting where no simple mechanism can obtain a better approximation. From a technical viewpoint we make two novel contributions. First, we develop a supply-constrained version of buy-many approximation for a single buyer. Second, we develop a multi-dimensional online contention resolution scheme for unit-demand buyers that may be of independent interest in mechanism design. △ Less

Submitted 16 May, 2024; v1 submitted 4 April, 2022; originally announced April 2022.

arXiv:2203.17259 [pdf, other]

To ArXiv or not to ArXiv: A Study Quantifying Pros and Cons of Posting Preprints Online

Authors: Charvi Rastogi, Ivan Stelmakh, Xinwei Shen, Marina Meila, Federico Echenique, Shuchi Chawla, Nihar B. Shah

Abstract: Double-blind conferences have engaged in debates over whether to allow authors to post their papers online on arXiv or elsewhere during the review process. Independently, some authors of research papers face the dilemma of whether to put their papers on arXiv due to its pros and cons. We conduct a study to substantiate this debate and dilemma via quantitative measurements. Specifically, we conduct… ▽ More Double-blind conferences have engaged in debates over whether to allow authors to post their papers online on arXiv or elsewhere during the review process. Independently, some authors of research papers face the dilemma of whether to put their papers on arXiv due to its pros and cons. We conduct a study to substantiate this debate and dilemma via quantitative measurements. Specifically, we conducted surveys of reviewers in two top-tier double-blind computer science conferences -- ICML 2021 (5361 submissions and 4699 reviewers) and EC 2021 (498 submissions and 190 reviewers). Our two main findings are as follows. First, more than a third of the reviewers self-report searching online for a paper they are assigned to review. Second, outside the review process, we find that preprints from better-ranked affiliations see a weakly higher visibility, with a correlation of 0.06 in ICML and 0.05 in EC. In particular, papers associated with the top-10-ranked affiliations had a visibility of approximately 11% in ICML and 22% in EC, whereas the remaining papers had a visibility of 7% and 18% respectively. △ Less

Submitted 11 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: 17 pages, 3 figures

arXiv:2203.17239 [pdf, other]

doi 10.1371/journal.pone.0283980

Cite-seeing and Reviewing: A Study on Citation Bias in Peer Review

Authors: Ivan Stelmakh, Charvi Rastogi, Ryan Liu, Shuchi Chawla, Federico Echenique, Nihar B. Shah

Abstract: Citations play an important role in researchers' careers as a key factor in evaluation of scientific impact. Many anecdotes advice authors to exploit this fact and cite prospective reviewers to try obtaining a more positive evaluation for their submission. In this work, we investigate if such a citation bias actually exists: Does the citation of a reviewer's own work in a submission cause them to… ▽ More Citations play an important role in researchers' careers as a key factor in evaluation of scientific impact. Many anecdotes advice authors to exploit this fact and cite prospective reviewers to try obtaining a more positive evaluation for their submission. In this work, we investigate if such a citation bias actually exists: Does the citation of a reviewer's own work in a submission cause them to be positively biased towards the submission? In conjunction with the review process of two flagship conferences in machine learning and algorithmic economics, we execute an observational study to test for citation bias in peer review. In our analysis, we carefully account for various confounding factors such as paper quality and reviewer expertise, and apply different modeling techniques to alleviate concerns regarding the model mismatch. Overall, our analysis involves 1,314 papers and 1,717 reviewers and detects citation bias in both venues we consider. In terms of the effect size, by citing a reviewer's work, a submission has a non-trivial chance of getting a higher score from the reviewer: an expected increase in the score is approximately 0.23 on a 5-point Likert item. For reference, a one-point increase of a score by a single reviewer improves the position of a submission by 11% on average. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: 19 pages, 3 figures

arXiv:2202.06683 [pdf, ps, other]

Collective wind farm operation based on a predictive model increases utility-scale energy production

Authors: Michael F. Howland, Jesus Bas Quesada, Juan Jose Pena Martinez, Felipe Palou Larranaga, Neeraj Yadav, Jasvipul S. Chawla, Varun Sivaram, John O. Dabiri

Abstract: Wind turbines located in wind farms are operated to maximize only their own power production. Individual operation results in wake losses that reduce farm energy. In this study, we operate a wind turbine array collectively to maximize total array production through wake steering. The selection of the farm control strategy relies on the optimization of computationally efficient flow models. We deve… ▽ More Wind turbines located in wind farms are operated to maximize only their own power production. Individual operation results in wake losses that reduce farm energy. In this study, we operate a wind turbine array collectively to maximize total array production through wake steering. The selection of the farm control strategy relies on the optimization of computationally efficient flow models. We develop a physics-based, data-assisted flow control model to predict the optimal control strategy. In contrast to previous studies, we first design and implement a multi-month field experiment at a utility-scale wind farm to validate the model over a range of control strategies, most of which are suboptimal. The flow control model is able to predict the optimal yaw misalignment angles for the array within +/- 5 degrees for most wind directions (11-32% power gains). Using the validated model, we design a control protocol which increases the energy production of the farm in a second multi-month experiment by 2.7% and 1.0%, for the wind directions of interest and for wind speeds between 6 and 8 m/s and all wind speeds, respectively. The developed and validated predictive model can enable a wider adoption of collective wind farm operation. △ Less

Submitted 26 January, 2022; originally announced February 2022.

Comments: 14 pages, 5 figures, 1 table

arXiv:2201.02381 [pdf, other]

Offline Reinforcement Learning for Road Traffic Control

Authors: Mayuresh Kunjir, Sanjay Chawla

Abstract: Traffic signal control is an important problem in urban mobility with a significant potential of economic and environmental impact. While there is a growing interest in Reinforcement Learning (RL) for traffic signal control, the work so far has focussed on learning through simulations which could lead to inaccuracies due to simplifying assumptions. Instead, real experience data on traffic is avail… ▽ More Traffic signal control is an important problem in urban mobility with a significant potential of economic and environmental impact. While there is a growing interest in Reinforcement Learning (RL) for traffic signal control, the work so far has focussed on learning through simulations which could lead to inaccuracies due to simplifying assumptions. Instead, real experience data on traffic is available and could be exploited at minimal costs. Recent progress in offline or batch RL has enabled just that. Model-based offline RL methods, in particular, have been shown to generalize from the experience data much better than others. We build a model-based learning framework which infers a Markov Decision Process (MDP) from a dataset collected using a cyclic traffic signal control policy that is both commonplace and easy to gather. The MDP is built with pessimistic costs to manage out-of-distribution scenarios using an adaptive sha** of rewards which is shown to provide better regularization compared to the prior related work in addition to being PAC-optimal. Our model is evaluated on a complex signalized roundabout showing that it is possible to build highly performant traffic control policies in a data efficient manner. △ Less

Submitted 11 December, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: 30 pages

ACM Class: I.2.1

arXiv:2112.14730 [pdf, ps, other]

doi 10.1103/PhysRevD.107.066024

Quantum Gravity Corrections to the Fall of the Apple

Authors: Samarth Chawla, Maulik Parikh

Abstract: We consider the motion of a massive particle in a static, weakly-curved spacetime where the gravitational field is taken to be quantized. We find that Newton's law of free-fall is modified by quantum-gravitational corrections, in addition to the known special-relativistic and post-Newtonian modifications. The quantum-gravitational corrections take the form of stochastic noise in the particle traje… ▽ More We consider the motion of a massive particle in a static, weakly-curved spacetime where the gravitational field is taken to be quantized. We find that Newton's law of free-fall is modified by quantum-gravitational corrections, in addition to the known special-relativistic and post-Newtonian modifications. The quantum-gravitational corrections take the form of stochastic noise in the particle trajectory, where the statistical properties of the noise depend on the quantum state of the gravitational field. △ Less

Submitted 10 March, 2023; v1 submitted 29 December, 2021; originally announced December 2021.

Comments: 13 pages, LaTeX. v3: references added, expanded discussion

Journal ref: Phys. Rev. D 107, 066024 (2023)

arXiv:2112.10028 [pdf, other]

doi 10.1145/3627106.3627199

Attack of the Knights: A Non Uniform Cache Side-Channel Attack

Authors: Farabi Mahmud, Sungkeun Kim, Harpreet Singh Chawla, Chia-Che Tsai, Eun Jung Kim, Abdullah Muzahid

Abstract: For a distributed last-level cache (LLC) in a large multicore chip, the access time to one LLC bank can significantly differ from that to another due to the difference in physical distance. In this paper, we successfully demonstrated a new distance-based side-channel attack by timing the AES decryption operation and extracting part of an AES secret key on an Intel Knights Landing CPU. We introduce… ▽ More For a distributed last-level cache (LLC) in a large multicore chip, the access time to one LLC bank can significantly differ from that to another due to the difference in physical distance. In this paper, we successfully demonstrated a new distance-based side-channel attack by timing the AES decryption operation and extracting part of an AES secret key on an Intel Knights Landing CPU. We introduce several techniques to overcome the challenges of the attack, including the use of multiple attack threads to ensure LLC hits, to detect vulnerable memory locations, and to obtain fine-grained timing of the victim operations. While operating as a covert channel, this attack can reach a bandwidth of 205 kbps with an error rate of only 0.02%. We also observed that the side-channel attack can extract 4 bytes of an AES key with 100% accuracy with only 4000 trial rounds of encryption △ Less

Submitted 31 May, 2023; v1 submitted 18 December, 2021; originally announced December 2021.

Journal ref: Annual Computer Security Applications Conference ACSAC 2023

arXiv:2110.06456 [pdf, other]

doi 10.1145/3474717.3483651

Updating Street Maps using Changes Detected in Satellite Imagery

Authors: Favyen Bastani, Songtao He, Satvat Jagwani, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, Mohammad Amin Sadeghi

Abstract: Accurately maintaining digital street maps is labor-intensive. To address this challenge, much work has studied automatically processing geospatial data sources such as GPS trajectories and satellite images to reduce the cost of maintaining digital maps. An end-to-end map update system would first process geospatial data sources to extract insights, and second leverage those insights to update and… ▽ More Accurately maintaining digital street maps is labor-intensive. To address this challenge, much work has studied automatically processing geospatial data sources such as GPS trajectories and satellite images to reduce the cost of maintaining digital maps. An end-to-end map update system would first process geospatial data sources to extract insights, and second leverage those insights to update and improve the map. However, prior work largely focuses on the first step of this pipeline: these map extraction methods infer road networks from scratch given geospatial data sources (in effect creating entirely new maps), but do not address the second step of leveraging this extracted information to update the existing digital map data. In this paper, we first explain why current map extraction techniques yield low accuracy when extended to update existing maps. We then propose a novel method that leverages the progression of satellite imagery over time to substantially improve accuracy. Our approach first compares satellite images captured at different times to identify portions of the physical road network that have visibly changed, and then updates the existing map accordingly. We show that our change-based approach reduces map update error rates four-fold. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: SIGSPATIAL 2021

arXiv:2108.12976 [pdf, ps, other]

Approximating Pandora's Box with Correlations

Authors: Shuchi Chawla, Evangelia Gergatsouli, Jeremy McMahan, Christos Tzamos

Abstract: We revisit the classic Pandora's Box (PB) problem under correlated distributions on the box values. Recent work of arXiv:1911.01632 obtained constant approximate algorithms for a restricted class of policies for the problem that visit boxes in a fixed order. In this work, we study the complexity of approximating the optimal policy which may adaptively choose which box to visit next based on the va… ▽ More We revisit the classic Pandora's Box (PB) problem under correlated distributions on the box values. Recent work of arXiv:1911.01632 obtained constant approximate algorithms for a restricted class of policies for the problem that visit boxes in a fixed order. In this work, we study the complexity of approximating the optimal policy which may adaptively choose which box to visit next based on the values seen so far. Our main result establishes an approximation-preserving equivalence of PB to the well studied Uniform Decision Tree (UDT) problem from stochastic optimization and a variant of the Min-Sum Set Cover ($\text{MSSC}_f$) problem. For distributions of support $m$, UDT admits a $\log m$ approximation, and while a constant factor approximation in polynomial time is a long-standing open problem, constant factor approximations are achievable in subexponential time (arXiv:1906.11385). Our main result implies that the same properties hold for PB and $\text{MSSC}_f$. We also study the case where the distribution over values is given more succinctly as a mixture of $m$ product distributions. This problem is again related to a noisy variant of the Optimal Decision Tree which is significantly more challenging. We give a constant-factor approximation that runs in time $n^{ \tilde O( m^2/\varepsilon^2 ) }$ when the mixture components on every box are either identical or separated in TV distance by $\varepsilon$. △ Less

Submitted 21 July, 2023; v1 submitted 29 August, 2021; originally announced August 2021.

arXiv:2107.02846 [pdf]

Visions in Theoretical Computer Science: A Report on the TCS Visioning Workshop 2020

Authors: Shuchi Chawla, Jelani Nelson, Chris Umans, David Woodruff

Abstract: Theoretical computer science (TCS) is a subdiscipline of computer science that studies the mathematical foundations of computational and algorithmic processes and interactions. Work in this field is often recognized by its emphasis on mathematical technique and rigor. At the heart of the field are questions surrounding the nature of computation: What does it mean to compute? What is computable? An… ▽ More Theoretical computer science (TCS) is a subdiscipline of computer science that studies the mathematical foundations of computational and algorithmic processes and interactions. Work in this field is often recognized by its emphasis on mathematical technique and rigor. At the heart of the field are questions surrounding the nature of computation: What does it mean to compute? What is computable? And how efficiently? Every ten years or so the TCS community attends visioning workshops to discuss the challenges and recent accomplishments in the TCS field. The workshops and the outputs they produce are meant both as a reflection for the TCS community and as guiding principles for interested investment partners. Concretely, the workshop output consists of a number of nuggets, each summarizing a particular point, that are synthesized in the form of a white paper and illustrated with graphics/slides produced by a professional graphic designer. The second TCS Visioning Workshop was organized by the SIGACT Committee for the Advancement of Theoretical Computer Science and took place during the week of July 20, 2020. Despite the conference being virtual, there were over 76 participants, mostly from the United States, but also a few from Europe and Asia who were able to attend due to the online format. Workshop participants were divided into categories as reflected in the sections of this report: (1) models of computation; (2) foundations of data science; (3) cryptography; and (4) using theoretical computer science for other domains. Each group participated in a series of discussions that produced the nuggets below. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: A Computing Community Consortium (CCC) workshop report, 36 pages

Report number: ccc2021report_2

arXiv:2106.04704 [pdf, ps, other]

Pricing Ordered Items

Authors: Shuchi Chawla, Ro** Rezvan, Yifeng Teng, Christos Tzamos

Abstract: We study the revenue guarantees and approximability of item pricing. Recent work shows that with $n$ heterogeneous items, item-pricing guarantees an $O(\log n)$ approximation to the optimal revenue achievable by any (buy-many) mechanism, even when buyers have arbitrarily combinatorial valuations. However, finding good item prices is challenging -- it is known that even under unit-demand valuations… ▽ More We study the revenue guarantees and approximability of item pricing. Recent work shows that with $n$ heterogeneous items, item-pricing guarantees an $O(\log n)$ approximation to the optimal revenue achievable by any (buy-many) mechanism, even when buyers have arbitrarily combinatorial valuations. However, finding good item prices is challenging -- it is known that even under unit-demand valuations, it is NP-hard to find item prices that approximate the revenue of the optimal item pricing better than $O(\sqrt{n})$. Our work provides a more fine-grained analysis of the revenue guarantees and computational complexity in terms of the number of item ``categories'' which may be significantly fewer than $n$. We assume the items are partitioned in $k$ categories so that items within a category are totally-ordered and a buyer's value for a bundle depends only on the best item contained from every category. We show that item-pricing guarantees an $O(\log k)$ approximation to the optimal (buy-many) revenue and provide a PTAS for computing the optimal item-pricing when $k$ is constant. We also provide a matching lower bound showing that the problem is (strongly) NP-hard even when $k=1$. Our results naturally extend to the case where items are only partially ordered, in which case the revenue guarantees and computational complexity depend on the width of the partial ordering, i.e. the largest set for which no two items are comparable. △ Less

Submitted 4 November, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

arXiv:2104.01063 [pdf, other]

Permutation-Invariant Subgraph Discovery

Authors: Raghvendra Mall, Shameem A. Parambath, Han Yufei, Ting Yu, Sanjay Chawla

Abstract: We introduce Permutation and Structured Perturbation Inference (PSPI), a new problem formulation that abstracts many graph matching tasks that arise in systems biology. PSPI can be viewed as a robust formulation of the permutation inference or graph matching, where the objective is to find a permutation between two graphs under the assumption that a set of edges may have undergone a perturbation d… ▽ More We introduce Permutation and Structured Perturbation Inference (PSPI), a new problem formulation that abstracts many graph matching tasks that arise in systems biology. PSPI can be viewed as a robust formulation of the permutation inference or graph matching, where the objective is to find a permutation between two graphs under the assumption that a set of edges may have undergone a perturbation due to an underlying cause. For example, suppose there are two gene regulatory networks X and Y from a diseased and normal tissue respectively. Then, the PSPI problem can be used to detect if there has been a structural change between the two networks which can serve as a signature of the disease. Besides the new problem formulation, we propose an ADMM algorithm (STEPD) to solve a relaxed version of the PSPI problem. An extensive case study on comparative gene regulatory networks (GRNs) is used to demonstrate that STEPD is able to accurately infer structured perturbations and thus provides a tool for computational biologists to identify novel prognostic signatures. A spectral analysis confirms that STEPD can recover small clique-like perturbations making it a useful tool for detecting permutation-invariant changes in graphs. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: 8 pages, 4 Figures, 2 Tables

arXiv:2012.12394 [pdf, other]

Probabilistic Outlier Detection and Generation

Authors: Stefano Giovanni Rizzo, Linsey Pang, Yixian Chen, Sanjay Chawla

Abstract: A new method for outlier detection and generation is introduced by lifting data into the space of probability distributions which are not analytically expressible, but from which samples can be drawn using a neural generator. Given a mixture of unknown latent inlier and outlier distributions, a Wasserstein double autoencoder is used to both detect and generate inliers and outliers. The proposed me… ▽ More A new method for outlier detection and generation is introduced by lifting data into the space of probability distributions which are not analytically expressible, but from which samples can be drawn using a neural generator. Given a mixture of unknown latent inlier and outlier distributions, a Wasserstein double autoencoder is used to both detect and generate inliers and outliers. The proposed method, named WALDO (Wasserstein Autoencoder for Learning the Distribution of Outliers), is evaluated on classical data sets including MNIST, CIFAR10 and KDD99 for detection accuracy and robustness. We give an example of outlier detection on a real retail sales data set and an example of outlier generation for simulating intrusion attacks. However we foresee many application scenarios where WALDO can be used. To the best of our knowledge this is the first work that studies both outlier detection and generation together. △ Less

Submitted 22 December, 2020; originally announced December 2020.

arXiv:2011.09406 [pdf, ps, other]

Non-Adaptive Matroid Prophet Inequalities

Authors: Shuchi Chawla, Kira Goldner, Anna R. Karlin, J. Benjamin Miller

Abstract: We investigate non-adaptive algorithms for matroid prophet inequalities. Matroid prophet inequalities have been considered resolved since 2012 when [KW12] introduced thresholds that guarantee a tight 2-approximation to the prophet; however, this algorithm is adaptive. Other approaches of [CHMS10] and [FSZ16] have used non-adaptive thresholds with a feasibility restriction; however, this translates… ▽ More We investigate non-adaptive algorithms for matroid prophet inequalities. Matroid prophet inequalities have been considered resolved since 2012 when [KW12] introduced thresholds that guarantee a tight 2-approximation to the prophet; however, this algorithm is adaptive. Other approaches of [CHMS10] and [FSZ16] have used non-adaptive thresholds with a feasibility restriction; however, this translates to adaptively changing an item's threshold to infinity when it cannot be taken with respect to the additional feasibility constraint, hence the algorithm is not truly non-adaptive. A major application of prophet inequalities is in auction design, where non-adaptive prices possess a significant advantage: they convert to order-oblivious posted pricings, and are essential for translating a prophet inequality into a truthful mechanism for multi-dimensional buyers. The existing matroid prophet inequalities do not suffice for this application. We present the first non-adaptive constant-factor prophet inequality for graphic matroids. △ Less

Submitted 18 November, 2020; originally announced November 2020.

arXiv:2010.03289 [pdf, other]

doi 10.1145/3397536.3422274

QarSUMO: A Parallel, Congestion-optimized Traffic Simulator

Authors: Hao Chen, Ke Yang, Stefano Giovanni Rizzo, Giovanna Vantini, Phillip Taylor, Xiaosong Ma, Sanjay Chawla

Abstract: Traffic simulators are important tools for tasks such as urban planning and transportation management. Microscopic simulators allow per-vehicle movement simulation, but require longer simulation time. The simulation overhead is exacerbated when there is traffic congestion and most vehicles move slowly. This in particular hurts the productivity of emerging urban computing studies based on reinforce… ▽ More Traffic simulators are important tools for tasks such as urban planning and transportation management. Microscopic simulators allow per-vehicle movement simulation, but require longer simulation time. The simulation overhead is exacerbated when there is traffic congestion and most vehicles move slowly. This in particular hurts the productivity of emerging urban computing studies based on reinforcement learning, where traffic simulations are heavily and repeatedly used for designing policies to optimize traffic related tasks. In this paper, we develop QarSUMO, a parallel, congestion-optimized version of the popular SUMO open-source traffic simulator. QarSUMO performs high-level parallelization on top of SUMO, to utilize powerful multi-core servers and enables future extension to multi-node parallel simulation if necessary. The proposed design, while partly sacrificing speedup, makes QarSUMO compatible with future SUMO improvements. We further contribute such an improvement by modifying the SUMO simulation engine for congestion scenarios where the update computation of consecutive and slow-moving vehicles can be simplified. We evaluate QarSUMO with both real-world and synthetic road network and traffic data, and examine its execution time as well as simulation accuracy relative to the original, sequential SUMO. △ Less

Submitted 21 October, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: Fix a typo in Figure 9

ACM Class: C.1.4; H.4.0

arXiv:2009.08100 [pdf, other]

How-to Present News on Social Media: A Causal Analysis of Editing News Headlines for Boosting User Engagement

Authors: Kunwoo Park, Haewoon Kwak, Jisun An, Sanjay Chawla

Abstract: To reach a broader audience and optimize traffic toward news articles, media outlets commonly run social media accounts and share their content with a short text summary. Despite its importance of writing a compelling message in sharing articles, the research community does not own a sufficient understanding of what kinds of editing strategies effectively promote audience engagement. In this study… ▽ More To reach a broader audience and optimize traffic toward news articles, media outlets commonly run social media accounts and share their content with a short text summary. Despite its importance of writing a compelling message in sharing articles, the research community does not own a sufficient understanding of what kinds of editing strategies effectively promote audience engagement. In this study, we aim to fill the gap by analyzing media outlets' current practices using a data-driven approach. We first build a parallel corpus of original news articles and their corresponding tweets that eight media outlets shared. Then, we explore how those media edited tweets against original headlines and the effects of such changes. To estimate the effects of editing news headlines for social media sharing in audience engagement, we present a systematic analysis that incorporates a causal inference technique with deep learning; using propensity score matching, it allows for estimating potential (dis-)advantages of an editing style compared to counterfactual cases where a similar news article is shared with a different style. According to the analyses of various editing styles, we report common and differing effects of the styles across the outlets. To understand the effects of various editing styles, media outlets could apply our easy-to-use tool by themselves. △ Less

Submitted 21 April, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: ICWSM'21 full paper

arXiv:2008.02467 [pdf]

Unravelling the Architecture of Membrane Proteins with Conditional Random Fields

Authors: Lior Lukov, Sanjay Chawla, Wei Liu, Brett Church, Gaurav Pandey

Abstract: In this paper, we will show that the recently introduced graphical model: Conditional Random Fields (CRF) provides a template to integrate micro-level information about biological entities into a mathematical model to understand their macro-level behavior. More specifically, we will apply the CRF model to an important classification problem in protein science, namely the secondary structure predic… ▽ More In this paper, we will show that the recently introduced graphical model: Conditional Random Fields (CRF) provides a template to integrate micro-level information about biological entities into a mathematical model to understand their macro-level behavior. More specifically, we will apply the CRF model to an important classification problem in protein science, namely the secondary structure prediction of proteins based on the observed primary structure. A comparison on benchmark data sets against twenty-eight other methods shows that not only does the CRF model lead to extremely accurate predictions but the modular nature of the model and the freedom to integrate disparate, overlap** and non-independent sources of information, makes the model an extremely versatile tool to potentially solve many other problems in bioinformatics. △ Less

Submitted 6 August, 2020; originally announced August 2020.

Comments: See the originally compiled PDF of this paper at: https://drive.google.com/file/d/1IYF52Wk8m96KIlrQHUVtEBdm0Kw3M40c

arXiv:2008.00873 [pdf, ps, other]

Influence of atmospheric conditions on the power production of utility-scale wind turbines in yaw misalignment

Authors: Michael F. Howland, Carlos Moral Gonzalez, Juan Jose Pena Martinez, Jesus Bas Quesada, Felipe Palou Larranaga, Neeraj K. Yadav, Jasvipul S. Chawla, John O. Dabiri

Abstract: The intentional yaw misalignment of leading, upwind turbines in a wind farm, termed wake steering, has demonstrated potential as a collective control approach for wind farm power maximization. The optimal control strategy, and resulting effect of wake steering on wind farm power production, are in part dictated by the power degradation of the upwind yaw misaligned wind turbines. In the atmospheric… ▽ More The intentional yaw misalignment of leading, upwind turbines in a wind farm, termed wake steering, has demonstrated potential as a collective control approach for wind farm power maximization. The optimal control strategy, and resulting effect of wake steering on wind farm power production, are in part dictated by the power degradation of the upwind yaw misaligned wind turbines. In the atmospheric boundary layer, the wind speed and direction may vary significantly over the wind turbine rotor area, depending on atmospheric conditions and stability, resulting in freestream turbine power production which is asymmetric as a function of the direction of yaw misalignment and which varies during the diurnal cycle. In this study, we propose a model for the power production of a wind turbine in yaw misalignment based on aerodynamic blade elements which incorporates the effects of wind speed and direction changes over the turbine rotor area in yaw misalignment. A field experiment is performed using multiple utility-scale wind turbines to characterize the power production of yawed freestream operating turbines depending on the wind conditions, and the model is validated using the experimental data. The resulting power production of a yaw misaligned variable speed wind turbine depends on a nonlinear interaction between the yaw misalignment, the atmospheric conditions, and the wind turbine control system. △ Less

Submitted 3 August, 2020; v1 submitted 31 July, 2020; originally announced August 2020.

Comments: 37 pages, 15 figures

arXiv:2007.09547 [pdf, other]

Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding

Authors: Songtao He, Favyen Bastani, Satvat Jagwani, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Mohamed M. Elshrif, Samuel Madden, Amin Sadeghi

Abstract: Inferring road graphs from satellite imagery is a challenging computer vision task. Prior solutions fall into two categories: (1) pixel-wise segmentation-based approaches, which predict whether each pixel is on a road, and (2) graph-based approaches, which predict the road graph iteratively. We find that these two approaches have complementary strengths while suffering from their own inherent limi… ▽ More Inferring road graphs from satellite imagery is a challenging computer vision task. Prior solutions fall into two categories: (1) pixel-wise segmentation-based approaches, which predict whether each pixel is on a road, and (2) graph-based approaches, which predict the road graph iteratively. We find that these two approaches have complementary strengths while suffering from their own inherent limitations. In this paper, we propose a new method, Sat2Graph, which combines the advantages of the two prior categories into a unified framework. The key idea in Sat2Graph is a novel encoding scheme, graph-tensor encoding (GTE), which encodes the road graph into a tensor representation. GTE makes it possible to train a simple, non-recurrent, supervised model to predict a rich set of features that capture the graph structure directly from an image. We evaluate Sat2Graph using two large datasets. We find that Sat2Graph surpasses prior methods on two widely used metrics, TOPO and APLS. Furthermore, whereas prior work only infers planar road graphs, our approach is capable of inferring stacked roads (e.g., overpasses), and does so robustly. △ Less

Submitted 18 July, 2020; originally announced July 2020.

Comments: ECCV 2020

arXiv:2007.07990 [pdf, other]

Static pricing for multi-unit prophet inequalities

Authors: Shuchi Chawla, Nikhil Devanur, Thodoris Lykouris

Abstract: We study a pricing problem where a seller has $k$ identical copies of a product, buyers arrive sequentially, and the seller prices the items aiming to maximize social welfare. When $k=1$, this is the so called "prophet inequality" problem for which there is a simple pricing scheme achieving a competitive ratio of $1/2$. On the other end of the spectrum, as $k$ goes to infinity, the asymptotic perf… ▽ More We study a pricing problem where a seller has $k$ identical copies of a product, buyers arrive sequentially, and the seller prices the items aiming to maximize social welfare. When $k=1$, this is the so called "prophet inequality" problem for which there is a simple pricing scheme achieving a competitive ratio of $1/2$. On the other end of the spectrum, as $k$ goes to infinity, the asymptotic performance of both static and adaptive pricing is well understood. We provide a static pricing scheme for the small-supply regime: where $k$ is small but larger than $1$. Prior to our work, the best competitive ratio known for this setting was the $1/2$ that follows from the single-unit prophet inequality. Our pricing scheme is easy to describe as well as practical -- it is anonymous, non-adaptive, and order-oblivious. We pick a single price that equalizes the expected fraction of items sold and the probability that the supply does not sell out before all customers are served; this price is then offered to each customer while supply lasts. This extends an approach introduced by Samuel-Cahn for the case of $k=1$. This pricing scheme achieves a competitive ratio that increases gradually with the supply. Subsequent work by Jiang, Ma, and Zhang shows that our pricing scheme is the optimal static pricing for every value of $k$. △ Less

Submitted 20 June, 2023; v1 submitted 15 July, 2020; originally announced July 2020.

arXiv:2003.13966 [pdf, other]

Individual Fairness in Advertising Auctions through Inverse Proportionality

Authors: Shuchi Chawla, Meena Jagadeesan

Abstract: Recent empirical work demonstrates that online advertisement can exhibit bias in the delivery of ads across users even when all advertisers bid in a non-discriminatory manner. We study the design of ad auctions that, given fair bids, are guaranteed to produce fair outcomes. Following the works of Dwork and Ilvento (2019) and Chawla et al. (2020), our goal is to design a truthful auction that satis… ▽ More Recent empirical work demonstrates that online advertisement can exhibit bias in the delivery of ads across users even when all advertisers bid in a non-discriminatory manner. We study the design of ad auctions that, given fair bids, are guaranteed to produce fair outcomes. Following the works of Dwork and Ilvento (2019) and Chawla et al. (2020), our goal is to design a truthful auction that satisfies ``individual fairness'' in its outcomes: informally speaking, users that are similar to each other should obtain similar allocations of ads. Within this framework we quantify the tradeoff between social welfare maximization and fairness. This work makes two conceptual contributions. First, we express the fairness constraint as a kind of stability condition: any two users that are assigned multiplicatively similar values by all the advertisers must receive additively similar allocations for each advertiser. This value stability constraint is expressed as a function that maps the multiplicative distance between value vectors to the maximum allowable $\ell_{\infty}$ distance between the corresponding allocations. Standard auctions do not satisfy this kind of value stability. Second, we introduce a new class of allocation algorithms called Inverse Proportional Allocation that achieve a near optimal tradeoff between fairness and social welfare for a broad and expressive class of value stability conditions. These allocation algorithms are truthful and prior-free, and achieve a constant factor approximation to the optimal (unconstrained) social welfare. In particular, the approximation ratio is independent of the number of advertisers in the system. In this respect, these allocation algorithms greatly surpass the guarantees achieved in previous work. We also extend our results to broader notions of fairness that we call subset fairness. △ Less

Submitted 30 November, 2021; v1 submitted 31 March, 2020; originally announced March 2020.

Comments: To appear at ITCS 2022; this is the full version

arXiv:2003.10636 [pdf, ps, other]

Menu-size Complexity and Revenue Continuity of Buy-many Mechanisms

Authors: Shuchi Chawla, Yifeng Teng, Christos Tzamos

Abstract: We study the multi-item mechanism design problem where a monopolist sells $n$ heterogeneous items to a single buyer. We focus on buy-many mechanisms, a natural class of mechanisms frequently used in practice. The buy-many property allows the buyer to interact with the mechanism multiple times instead of once as in the more commonly studied buy-one mechanisms. This imposes additional incentive cons… ▽ More We study the multi-item mechanism design problem where a monopolist sells $n$ heterogeneous items to a single buyer. We focus on buy-many mechanisms, a natural class of mechanisms frequently used in practice. The buy-many property allows the buyer to interact with the mechanism multiple times instead of once as in the more commonly studied buy-one mechanisms. This imposes additional incentive constraints and thus restricts the space of mechanisms that the seller can use. In this paper, we explore the qualitative differences between buy-one and buy-many mechanisms focusing on two important properties: revenue continuity and menu-size complexity. Our first main result is that when the value function of the buyer is perturbed multiplicatively by a factor of $1\pmε$, the optimal revenue obtained by buy-many mechanisms only changes by a factor of $1 \pm \textrm{poly}(n,ε)$. In contrast, for buy-one mechanisms, the revenue of the resulting optimal mechanism after such a perturbation can change arbitrarily. Our second main result shows that under any distribution of arbitrary valuations, finite menu size suffices to achieve a $(1-ε)$-approximation to the optimal buy-many mechanism. We give tight upper and lower bounds on the number of menu entries as a function of $n$ and $ε$. On the other hand, such a result fails to hold for buy-one mechanisms as even for two items and a buyer with either unit-demand or additive valuations, the menu-size complexity of approximately optimal mechanisms is unbounded. △ Less

Submitted 23 March, 2020; originally announced March 2020.

arXiv:1912.12408 [pdf, other]

RoadTagger: Robust Road Attribute Inference with Graph Neural Networks

Authors: Songtao He, Favyen Bastani, Satvat Jagwani, Edward Park, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Samuel Madden, Mohammad Amin Sadeghi

Abstract: Inferring road attributes such as lane count and road type from satellite imagery is challenging. Often, due to the occlusion in satellite imagery and the spatial correlation of road attributes, a road attribute at one position on a road may only be apparent when considering far-away segments of the road. Thus, to robustly infer road attributes, the model must integrate scattered information and c… ▽ More Inferring road attributes such as lane count and road type from satellite imagery is challenging. Often, due to the occlusion in satellite imagery and the spatial correlation of road attributes, a road attribute at one position on a road may only be apparent when considering far-away segments of the road. Thus, to robustly infer road attributes, the model must integrate scattered information and capture the spatial correlation of features along roads. Existing solutions that rely on image classifiers fail to capture this correlation, resulting in poor accuracy. We find this failure is caused by a fundamental limitation -- the limited effective receptive field of image classifiers. To overcome this limitation, we propose RoadTagger, an end-to-end architecture which combines both Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) to infer road attributes. The usage of graph neural networks allows information propagation on the road network graph and eliminates the receptive field limitation of image classifiers. We evaluate RoadTagger on both a large real-world dataset covering 688 km^2 area in 20 U.S. cities and a synthesized micro-dataset. In the evaluation, RoadTagger improves inference accuracy over the CNN image classifier based approaches. RoadTagger also demonstrates strong robustness against different disruptions in the satellite imagery and the ability to learn complicated inductive rules for aggregating scattered information along the road network. △ Less

Submitted 28 December, 2019; originally announced December 2019.

arXiv:1911.01632 [pdf, ps, other]

Pandora's Box with Correlations: Learning and Approximation

Authors: Shuchi Chawla, Evangelia Gergatsouli, Yifeng Teng, Christos Tzamos, Ruimin Zhang

Abstract: The Pandora's Box problem and its extensions capture optimization problems with stochastic input where the algorithm can obtain instantiations of input random variables at some cost. To our knowledge, all previous work on this class of problems assumes that different random variables in the input are distributed independently. As such it does not capture many real-world settings. In this paper, we… ▽ More The Pandora's Box problem and its extensions capture optimization problems with stochastic input where the algorithm can obtain instantiations of input random variables at some cost. To our knowledge, all previous work on this class of problems assumes that different random variables in the input are distributed independently. As such it does not capture many real-world settings. In this paper, we provide the first approximation algorithms for Pandora's Box-type problems with correlations. We assume that the algorithm has access to samples drawn from the joint distribution on input. Algorithms for these problems must determine an order in which to probe random variables, as well as when to stop and return the best solution found so far. In general, an optimal algorithm may make both decisions adaptively based on instantiations observed previously. Such fully adaptive (FA) strategies cannot be efficiently approximated to within any sublinear factor with sample access. We therefore focus on the simpler objective of approximating partially adaptive (PA) strategies that probe random variables in a fixed predetermined order but decide when to stop based on the instantiations observed. We consider a number of different feasibility constraints and provide simple PA strategies that are approximately optimal with respect to the best PA strategy for each case. All of our algorithms have polynomial sample complexity. We further show that our results are tight within constant factors: better factors cannot be achieved even using the full power of FA strategies. △ Less

Submitted 16 April, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

arXiv:1910.04869 [pdf, other]

Inferring and Improving Street Maps with Data-Driven Automation

Authors: Favyen Bastani, Songtao He, Satvat Jagwani, Edward Park, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, Mohammad Amin Sadeghi

Abstract: Street maps are a crucial data source that help to inform a wide range of decisions, from navigating a city to disaster relief and urban planning. However, in many parts of the world, street maps are incomplete or lag behind new construction. Editing maps today involves a tedious process of manually tracing and annotating roads, buildings, and other map features. Over the past decade, many autom… ▽ More Street maps are a crucial data source that help to inform a wide range of decisions, from navigating a city to disaster relief and urban planning. However, in many parts of the world, street maps are incomplete or lag behind new construction. Editing maps today involves a tedious process of manually tracing and annotating roads, buildings, and other map features. Over the past decade, many automatic map inference systems have been proposed to automatically extract street map data from satellite imagery, aerial imagery, and GPS trajectory datasets. However, automatic map inference has failed to gain traction in practice due to two key limitations: high error rates (low precision), which manifest in noisy inference outputs, and a lack of end-to-end system design to leverage inferred data to update existing street maps. At MIT and QCRI, we have developed a number of algorithms and approaches to address these challenges, which we combined into a new system we call Mapster. Mapster is a human-in-the-loop street map editing system that incorporates three components to robustly accelerate the map** process over traditional tools and workflows: high-precision automatic map inference, data refinement, and machine-assisted map editing. Through an evaluation on a large-scale dataset including satellite imagery, GPS trajectories, and ground-truth map data in forty cities, we show that Mapster makes automation practical for map editing, and enables the curation of map datasets that are more complete and up-to-date at less cost. △ Less

Submitted 6 November, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

arXiv:1909.00845 [pdf, other]

Revenue Maximization for Query Pricing

Authors: Shuchi Chawla, Shaleen Deep, Paraschos Koutris, Yifeng Teng

Abstract: Buying and selling of data online has increased substantially over the last few years. Several frameworks have already been proposed that study query pricing in theory and practice. The key guiding principle in these works is the notion of {\em arbitrage-freeness} where the broker can set different prices for different queries made to the dataset, but must ensure that the pricing function does not… ▽ More Buying and selling of data online has increased substantially over the last few years. Several frameworks have already been proposed that study query pricing in theory and practice. The key guiding principle in these works is the notion of {\em arbitrage-freeness} where the broker can set different prices for different queries made to the dataset, but must ensure that the pricing function does not provide the buyers with opportunities for arbitrage. However, little is known about revenue maximization aspect of query pricing. In this paper, we study the problem faced by a broker selling access to data with the goal of maximizing her revenue. We show that this problem can be formulated as a revenue maximization problem with single-minded buyers and unlimited supply, for which several approximation algorithms are known. We perform an extensive empirical evaluation of the performance of several pricing algorithms for the query pricing problem on real-world instances. In addition to previously known approximation algorithms, we propose several new heuristics and analyze them both theoretically and experimentally. Our experiments show that algorithms with the best theoretical bounds are not necessarily the best empirically. We identify algorithms and heuristics that are both fast and also provide consistently good performance when valuations are drawn from a variety of distributions. △ Less

Submitted 9 September, 2019; v1 submitted 2 September, 2019; originally announced September 2019.

Comments: To appear in PVLDB; version 2 with some cosmetic changes

arXiv:1907.01484 [pdf, other]

Themis: Fair and Efficient GPU Cluster Scheduling

Authors: Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, Shuchi Chawla

Abstract: Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs h… ▽ More Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs have long-running tasks that need to be gang-scheduled, and their performance is sensitive to tasks' relative placement. We propose Themis, a new scheduling framework for ML training workloads. It's GPU allocation policy enforces that ML workloads complete in a finish-time fair manner, a new notion we introduce. To capture placement sensitivity and ensure efficiency, Themis uses a two-level scheduling architecture where ML workloads bid on available resources that are offered in an auction run by a central arbiter. Our auction design allocates GPUs to winning bids by trading off efficiency for fairness in the short term but ensuring finish-time fairness in the long term. Our evaluation on a production trace shows that Themis can improve fairness by more than 2.25X and is ~5% to 250% more cluster efficient in comparison to state-of-the-art schedulers. △ Less

Submitted 29 October, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

arXiv:1906.08732 [pdf, other]

Multi-Category Fairness in Sponsored Search Auctions

Authors: Shuchi Chawla, Christina Ilvento, Meena Jagadeesan

Abstract: Fairness in advertising is a topic of particular concern motivated by theoretical and empirical observations in both the computer science and economics literature. We examine the problem of fairness in advertising for general purpose platforms that service advertisers from many different categories. First, we propose inter-category and intra-category fairness desiderata that take inspiration from… ▽ More Fairness in advertising is a topic of particular concern motivated by theoretical and empirical observations in both the computer science and economics literature. We examine the problem of fairness in advertising for general purpose platforms that service advertisers from many different categories. First, we propose inter-category and intra-category fairness desiderata that take inspiration from individual fairness and envy-freeness. Second, we investigate the "platform utility" (a proxy for the quality of the allocation) achievable by mechanisms satisfying these desiderata. More specifically, we compare the utility of fair mechanisms against the unfair optimal, and we show by construction that our fairness desiderata are compatible with utility. That is, we construct a family of fair mechanisms with high utility that perform close to optimally within a class of fair mechanisms. Our mechanisms also enjoy nice implementation properties including metric-obliviousness, which allows the platform to produce fair allocations without needing to know the specifics of the fairness requirements. △ Less

Submitted 29 August, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

Comments: Updated version with revised and expanded content

arXiv:1906.07138 [pdf, other]

doi 10.1145/3274895.3274927

Machine-Assisted Map Editing

Authors: Favyen Bastani, Songtao He, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden

Abstract: Map** road networks today is labor-intensive. As a result, road maps have poor coverage outside urban centers in many countries. Systems to automatically infer road network graphs from aerial imagery and GPS trajectories have been proposed to improve coverage of road maps. However, because of high error rates, these systems have not been adopted by map** communities. We propose machine-assiste… ▽ More Map** road networks today is labor-intensive. As a result, road maps have poor coverage outside urban centers in many countries. Systems to automatically infer road network graphs from aerial imagery and GPS trajectories have been proposed to improve coverage of road maps. However, because of high error rates, these systems have not been adopted by map** communities. We propose machine-assisted map editing, where automatic map inference is integrated into existing, human-centric map editing workflows. To realize this, we build Machine-Assisted iD (MAiD), where we extend the web-based OpenStreetMap editor, iD, with machine-assistance functionality. We complement MAiD with a novel approach for inferring road topology from aerial imagery that combines the speed of prior segmentation approaches with the accuracy of prior iterative graph construction methods. We design MAiD to tackle the addition of major, arterial roads in regions where existing maps have poor coverage, and the incremental improvement of coverage in regions where major roads are already mapped. We conduct two user studies and find that, when participants are given a fixed time to map roads, they are able to add as much as 3.5x more roads with MAiD. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Journal ref: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pg 23-32, 2018

arXiv:1905.09130 [pdf, other]

AI-CARGO: A Data-Driven Air-Cargo Revenue Management System

Authors: Stefano Giovanni Rizzo, Ji Lucas, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Sanjay Chawla

Abstract: We propose AI-CARGO, a revenue management system for air-cargo that combines machine learning prediction with decision-making using mathematical optimization methods. AI-CARGO addresses a problem that is unique to the air-cargo business, namely the wide discrepancy between the quantity (weight or volume) that a shipper will book and the actual received amount at departure time by the airline. The… ▽ More We propose AI-CARGO, a revenue management system for air-cargo that combines machine learning prediction with decision-making using mathematical optimization methods. AI-CARGO addresses a problem that is unique to the air-cargo business, namely the wide discrepancy between the quantity (weight or volume) that a shipper will book and the actual received amount at departure time by the airline. The discrepancy results in sub-optimal and inefficient behavior by both the shipper and the airline resulting in the overall loss of potential revenue for the airline. AI-CARGO also includes a data cleaning component to deal with the heterogeneous forms in which booking data is transmitted to the airline cargo system. AI-CARGO is deployed in the production environment of a large commercial airline company. We have validated the benefits of AI-CARGO using real and synthetic datasets. Especially, we have carried out simulations using dynamic programming techniques to elicit the impact on offloading costs and revenue generation of our proposed system. Our results suggest that combining prediction within a decision-making framework can help dramatically to reduce offloading costs and optimize revenue generation. △ Less

Submitted 22 May, 2019; originally announced May 2019.

Comments: 9 pages, 8 figures

arXiv:1904.05325 [pdf, other]

Risk Aware Ranking for Top-$k$ Recommendations

Authors: Shameem A Puthiya Parambath, Nishant Vijayakumar, Sanjay Chawla

Abstract: Given an incomplete ratings data over a set of users and items, the preference completion problem aims to estimate a personalized total preference order over a subset of the items. In practical settings, a ranked list of top-$k$ items from the estimated preference order is recommended to the end user in the decreasing order of preference for final consumption. We analyze this model and observe tha… ▽ More Given an incomplete ratings data over a set of users and items, the preference completion problem aims to estimate a personalized total preference order over a subset of the items. In practical settings, a ranked list of top-$k$ items from the estimated preference order is recommended to the end user in the decreasing order of preference for final consumption. We analyze this model and observe that such a ranking model results in suboptimal performance when the payoff associated with the recommended items is different. We propose a novel and very efficient algorithm for the preference ranking considering the uncertainty regarding the payoffs of the items. Once the preference scores for the users are obtained using any preference learning algorithm, we show that ranking the items using a risk seeking utility function results in the best ranking performance. △ Less

Submitted 12 April, 2019; v1 submitted 17 March, 2019; originally announced April 2019.

Showing 1–50 of 119 results for author: Chawla, S