Search | arXiv e-print repository

arXiv:2312.09418 [pdf, other]

Predicting Multi-Joint Kinematics of the Upper Limb from EMG Signals Across Varied Loads with a Physics-Informed Neural Network

Authors: Rajnish Kumar, Suriya Prakash Muthukrishnan, Lalan Kumar, Sitikantha Roy

Abstract: In this research, we present an innovative method known as a physics-informed neural network (PINN) model to predict multi-joint kinematics using electromyography (EMG) signals recorded from the muscles surrounding these joints across various loads. The primary aim is to simultaneously predict both the shoulder and elbow joint angles while executing elbow flexion-extension (FE) movements, especial… ▽ More In this research, we present an innovative method known as a physics-informed neural network (PINN) model to predict multi-joint kinematics using electromyography (EMG) signals recorded from the muscles surrounding these joints across various loads. The primary aim is to simultaneously predict both the shoulder and elbow joint angles while executing elbow flexion-extension (FE) movements, especially under varying load conditions. The PINN model is constructed by combining a feed-forward Artificial Neural Network (ANN) with a joint torque computation model. During the training process, the model utilizes a custom loss function derived from an inverse dynamics joint torque musculoskeletal model, along with a mean square angle loss. The training dataset for the PINN model comprises EMG and time data collected from four different subjects. To assess the model's performance, we conducted a comparison between the predicted joint angles and experimental data using a testing data set. The results demonstrated strong correlations of 58% to 83% in joint angle prediction. The findings highlight the potential of incorporating physical principles into the model, not only increasing its versatility but also enhancing its accuracy. The findings could have significant implications for the precise estimation of multi-joint kinematics in dynamic scenarios, particularly concerning the advancement of human-machine interfaces (HMIs) for exoskeletons and prosthetic control systems. △ Less

Submitted 28 November, 2023; originally announced December 2023.

arXiv:2308.03555 [pdf, other]

NeuroAiR: Deep Learning Framework for Airwriting Recognition from Scalp-recorded Neural Signals

Authors: Ayush Tripathi, Aryan Gupta, A. P. Prathosh, Suriya Prakash Muthukrishnan, Lalan Kumar

Abstract: Airwriting recognition is a task that involves identifying letters written in free space using finger movement. It is a special case of gesture recognition, where gestures correspond to letters in a specific language. Electroencephalography (EEG) is a non-invasive technique for recording brain activity and has been widely used in brain-computer interface applications. Leveraging EEG signals for ai… ▽ More Airwriting recognition is a task that involves identifying letters written in free space using finger movement. It is a special case of gesture recognition, where gestures correspond to letters in a specific language. Electroencephalography (EEG) is a non-invasive technique for recording brain activity and has been widely used in brain-computer interface applications. Leveraging EEG signals for airwriting recognition offers a promising alternative input method for Human-Computer Interaction. One key advantage of airwriting recognition is that users don't need to learn new gestures. By concatenating recognized letters, a wide range of words can be formed, making it applicable to a broader population. However, there has been limited research in the recognition of airwriting using EEG signals, which forms the core focus of this study. The NeuroAiR dataset comprising EEG signals recorded during writing English uppercase alphabets is first constructed. Various features are then explored in conjunction with different deep learning models to achieve accurate airwriting recognition. These features include processed EEG data, Independent Component Analysis components, source-domain-based scout time series, and spherical and head harmonic decomposition-based features. Furthermore, the impact of different EEG frequency bands on system performance is comprehensively investigated. The highest accuracy achieved in this study is 44.04% using Independent Component Analysis components and the EEGNet classification model. The results highlight the potential of EEG-based airwriting recognition as a user-friendly modality for alternative input methods in Human-Computer Interaction applications. This research sets a strong baseline for future advancements and demonstrates the viability and utility of EEG-based airwriting recognition. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2306.17427 [pdf]

Modeling and parametric optimization of 3D tendon-sheath actuator system for upper limb soft exosuit

Authors: Amit Yadav, Nitesh Kumar, Shaurya Surana, Aravind Ramasamy, Abhishek Rudra Pal, Sushma Santapuri, Lalan Kumar, Suriya Prakash Muthukrishnan, Shubhendu Bhasin, Sitikantha Roy

Abstract: This paper presents an analysis of parametric characterization of a motor driven tendon-sheath actuator system for use in upper limb augmentation for applications such as rehabilitation, therapy, and industrial automation. The double tendon sheath system, which uses two sets of cables (agonist and antagonist side) guided through a sheath, is considered to produce smooth and natural-looking movemen… ▽ More This paper presents an analysis of parametric characterization of a motor driven tendon-sheath actuator system for use in upper limb augmentation for applications such as rehabilitation, therapy, and industrial automation. The double tendon sheath system, which uses two sets of cables (agonist and antagonist side) guided through a sheath, is considered to produce smooth and natural-looking movements of the arm. The exoskeleton is equipped with a single motor capable of controlling both the flexion and extension motions. One of the key challenges in the implementation of a double tendon sheath system is the possibility of slack in the tendon, which can impact the overall performance of the system. To address this issue, a robust mathematical model is developed and a comprehensive parametric study is carried out to determine the most effective strategies for overcoming the problem of slack and improving the transmission. The study suggests that incorporating a series spring into the system's tendon leads to a universally applicable design, eliminating the need for individual customization. The results also show that the slack in the tendon can be effectively controlled by changing the pretension, spring constant, and size and geometry of spool mounted on the axle of motor. △ Less

Submitted 10 September, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

arXiv:2304.14823 [pdf, other]

Adaptive Gravity Compensation Control of a Cable-Driven Upper-Arm Soft Exosuit

Authors: Joyjit Mukherjee, Ankit Chatterjee, Shreeshan Jena, Nitesh Kumar, Suriya Prakash Muthukrishnan, Sitikantha Roy, Shubhendu Bhasin

Abstract: This paper proposes an adaptive gravity compensation (AGC) control strategy for a cable-driven upper-limb exosuit intended to assist the wearer with lifting tasks. Unlike most model-based control techniques used for this human-robot interaction task, the proposed control design does not assume knowledge of the anthropometric parameters of the wearer's arm and the payload. Instead, the uncertaintie… ▽ More This paper proposes an adaptive gravity compensation (AGC) control strategy for a cable-driven upper-limb exosuit intended to assist the wearer with lifting tasks. Unlike most model-based control techniques used for this human-robot interaction task, the proposed control design does not assume knowledge of the anthropometric parameters of the wearer's arm and the payload. Instead, the uncertainties in human arm parameters, such as mass, length, and payload, are estimated online using an indirect adaptive control law that compensates for the gravity moment about the elbow joint. Additionally, the AGC controller is agnostic to the desired joint trajectory followed by the human arm. For the purpose of controller design, the human arm is modeled using a 1-DOF manipulator model. Further, a cable-driven actuator model is proposed that maps the assistive elbow torque to the actuator torque. The performance of the proposed method is verified through a co-simulation, wherein the control input realized in MATLAB is applied to the human bio-mechanical model in OpenSim under varying payload conditions. Significant reductions in human effort in terms of human muscle torque and metabolic cost are observed with the proposed control strategy. Further, simulation results show that the performance of the AGC controller converges to that of the gravity compensation (GC) controller, demonstrating the efficacy of AGC-based online parameter learning. △ Less

Submitted 28 April, 2023; originally announced April 2023.

arXiv:2301.03965 [pdf, other]

doi 10.1109/TIM.2023.3346505

BiCurNet: Pre-Movement EEG based Neural Decoder for Biceps Curl Trajectory Estimation

Authors: Manali Saini, Anant Jain, Lalan Kumar, Suriya Prakash Muthukrishnan, Shubhendu Bhasin, Sitikantha Roy

Abstract: Kinematic parameter (KP) estimation from early electroencephalogram (EEG) signals is essential for positive augmentation using wearable robot. However, work related to early estimation of KPs from surface EEG is sparse. In this work, a deep learning-based model, BiCurNet, is presented for early estimation of biceps curl using collected EEG signal. The model utilizes light-weight architecture with… ▽ More Kinematic parameter (KP) estimation from early electroencephalogram (EEG) signals is essential for positive augmentation using wearable robot. However, work related to early estimation of KPs from surface EEG is sparse. In this work, a deep learning-based model, BiCurNet, is presented for early estimation of biceps curl using collected EEG signal. The model utilizes light-weight architecture with depth-wise separable convolution layers and customized attention module. The feasibility of early estimation of KPs is demonstrated using brain source imaging. Computationally efficient EEG features in spherical and head harmonics domain is utilized for the first time for KP prediction. The best Pearson correlation coefficient (PCC) between estimated and actual trajectory of $0.7$ is achieved when combined EEG features (spatial and harmonics domain) in delta band is utilized. Robustness of the proposed network is demonstrated for subject-dependent and subject-independent training, using EEG signals with artifacts. △ Less

Submitted 26 October, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

arXiv:2212.02870 [pdf, other]

TripCEAiR: A Multi-Loss minimization approach for surface EMG based Airwriting Recognition

Authors: Ayush Tripathi, Prathosh AP, Suriya Prakash Muthukrishnan, Lalan Kumar

Abstract: Airwriting Recognition refers to the problem of identification of letters written in space with movement of the finger. It can be seen as a special case of dynamic gesture recognition wherein the set of gestures are letters in a particular language. Surface Electromyography (sEMG) is a non-invasive approach used to capture electrical signals generated as a result of contraction and relaxation of t… ▽ More Airwriting Recognition refers to the problem of identification of letters written in space with movement of the finger. It can be seen as a special case of dynamic gesture recognition wherein the set of gestures are letters in a particular language. Surface Electromyography (sEMG) is a non-invasive approach used to capture electrical signals generated as a result of contraction and relaxation of the muscles. sEMG has been widely adopted for gesture recognition applications. Unlike static gestures, dynamic gestures are user-friendly and can be used as a method for input with applications in Human Computer Interaction. There has been limited work in recognition of dynamic gestures such as airwriting, using sEMG signals and forms the core of the current work. In this work, a multi-loss minimization framework for sEMG based airwriting recognition is proposed. The proposed framework aims at learning a feature embedding vector that minimizes the triplet loss, while simultaneously learning the parameters of a classifier head to recognize corresponding alphabets. The proposed method is validated on a dataset recorded in the lab comprising of sEMG signals from 50 participants writing English uppercase alphabets. The effect of different variations of triplet loss, triplet mining strategies and feature embedding dimension is also presented. The best-achieved accuracy was 81.26% and 65.62% in user-dependent and independent scenarios respectively by using semihard positive and hard negative triplet mining. The code for our implementation will be made available at https://github.com/ayushayt/TripCEAiR. △ Less

Submitted 19 March, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

arXiv:2210.17185 [pdf, other]

SurfMyoAiR: A surface Electromyography based framework for Airwriting Recognition

Authors: Ayush Tripathi, Lalan Kumar, Prathosh A. P., Suriya Prakash Muthukrishnan

Abstract: Airwriting Recognition is the task of identifying letters written in free space with finger movement. Electromyography (EMG) is a technique used to record electrical activity during muscle contraction and relaxation as a result of movement and is widely used for gesture recognition. Most of the current research in gesture recognition is focused on identifying static gestures. However, dynamic gest… ▽ More Airwriting Recognition is the task of identifying letters written in free space with finger movement. Electromyography (EMG) is a technique used to record electrical activity during muscle contraction and relaxation as a result of movement and is widely used for gesture recognition. Most of the current research in gesture recognition is focused on identifying static gestures. However, dynamic gestures are natural and user-friendly for being used as alternate input methods in Human-Computer Interaction applications. Airwriting recognition using EMG signals recorded from forearm muscles is therefore a viable solution. Since the user does not need to learn any new gestures and a large range of words can be formed by concatenating these letters, it is generalizable to a wider population. There has been limited work in recognition of airwriting using EMG signals and forms the core idea of the current work. The SurfMyoAiR dataset comprising of EMG signals recorded during writing English uppercase alphabets is constructed. Several different time-domain features to construct EMG envelope and two different time-frequency image representations: Short-Time Fourier Transform and Continuous Wavelet Transform were explored to form the input to a deep learning model for airwriting recognition. Several different deep learning architectures were exploited for this task. Additionally, the effect of various parameters such as signal length, window length and interpolation techniques on the recognition performance is comprehensively explored. The best-achieved accuracy was 78.50% and 62.19% in user-dependent and independent scenarios respectively by using Short-Time Fourier Transform in conjunction with a 2D Convolutional Neural Network based classifier. Airwriting has great potential as a user-friendly modality to be used as an alternate input method in Human-Computer Interaction applications. △ Less

Submitted 31 October, 2022; originally announced October 2022.

arXiv:2210.09924 [pdf, other]

Predicting Winning Regions in Parity Games via Graph Neural Networks (Extended Abstract)

Authors: Tobias Hecking, Swathy Muthukrishnan, Alexander Weinert

Abstract: Solving parity games is a major building block for numerous applications in reactive program verification and synthesis. While they can be solved efficiently in practice, no known approach has a polynomial worst-case runtime complexity. We present a incomplete polynomial-time approach to determining the winning regions of parity games via graph neural networks. Our evaluation on 900 randomly gen… ▽ More Solving parity games is a major building block for numerous applications in reactive program verification and synthesis. While they can be solved efficiently in practice, no known approach has a polynomial worst-case runtime complexity. We present a incomplete polynomial-time approach to determining the winning regions of parity games via graph neural networks. Our evaluation on 900 randomly generated parity games shows that this approach is effective and efficient in practice. It correctly determines the winning regions of $\sim$60\% of the games in our data set and only incurs minor errors in the remaining ones. We believe that this approach can be extended to efficiently solve parity games as well. △ Less

Submitted 27 July, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: 4 pages, extended abstract. Presented at DAV'23

arXiv:2010.15620 [pdf, other]

doi 10.1145/3340531.3412038

CAFE: Coarse-to-Fine Neural Symbolic Reasoning for Explainable Recommendation

Authors: Yikun Xian, Zuohui Fu, Handong Zhao, Yingqiang Ge, Xu Chen, Qiaoying Huang, Shijie Geng, Zhou Qin, Gerard de Melo, S. Muthukrishnan, Yongfeng Zhang

Abstract: Recent research explores incorporating knowledge graphs (KG) into e-commerce recommender systems, not only to achieve better recommendation performance, but more importantly to generate explanations of why particular decisions are made. This can be achieved by explicit KG reasoning, where a model starts from a user node, sequentially determines the next step, and walks towards an item node of pote… ▽ More Recent research explores incorporating knowledge graphs (KG) into e-commerce recommender systems, not only to achieve better recommendation performance, but more importantly to generate explanations of why particular decisions are made. This can be achieved by explicit KG reasoning, where a model starts from a user node, sequentially determines the next step, and walks towards an item node of potential interest to the user. However, this is challenging due to the huge search space, unknown destination, and sparse signals over the KG, so informative and effective guidance is needed to achieve a satisfactory recommendation quality. To this end, we propose a CoArse-to-FinE neural symbolic reasoning approach (CAFE). It first generates user profiles as coarse sketches of user behaviors, which subsequently guide a path-finding process to derive reasoning paths for recommendations as fine-grained predictions. User profiles can capture prominent user behaviors from the history, and provide valuable signals about which kinds of path patterns are more likely to lead to potential items of interest for the user. To better exploit the user profiles, an improved path-finding algorithm called Profile-guided Path Reasoning (PPR) is also developed, which leverages an inventory of neural symbolic reasoning modules to effectively and efficiently find a batch of paths over a large-scale KG. We extensively experiment on four real-world benchmarks and observe substantial gains in the recommendation performance compared with state-of-the-art methods. △ Less

Submitted 29 October, 2020; originally announced October 2020.

Comments: Accepted in CIKM 2020

arXiv:2007.13207 [pdf, other]

Neural-Symbolic Reasoning over Knowledge Graph for Multi-stage Explainable Recommendation

Authors: Yikun Xian, Zuohui Fu, Qiaoying Huang, S. Muthukrishnan, Yongfeng Zhang

Abstract: Recent work on recommender systems has considered external knowledge graphs as valuable sources of information, not only to produce better recommendations but also to provide explanations of why the recommended items were chosen. Pure rule-based symbolic methods provide a transparent reasoning process over knowledge graph but lack generalization ability to unseen examples, while deep learning mode… ▽ More Recent work on recommender systems has considered external knowledge graphs as valuable sources of information, not only to produce better recommendations but also to provide explanations of why the recommended items were chosen. Pure rule-based symbolic methods provide a transparent reasoning process over knowledge graph but lack generalization ability to unseen examples, while deep learning models enhance powerful feature representation ability but are hard to interpret. Moreover, direct reasoning over large-scale knowledge graph can be costly due to the huge search space of pathfinding. We approach the problem through a novel coarse-to-fine neural symbolic reasoning method called NSER. It first generates a coarse-grained explanation to capture abstract user behavioral pattern, followed by a fined-grained explanation accompanying with explicit reasoning paths and recommendations inferred from knowledge graph. We extensively experiment on four real-world datasets and observe substantial gains of recommendation performance compared with state-of-the-art methods as well as more diversified explanations in different granularity. △ Less

Submitted 26 July, 2020; originally announced July 2020.

Comments: Accepted at AAAI 2020 Workshop on Deep Learning on Graphs: Methodologies and Applications (DLGMA 20)

arXiv:1906.05237 [pdf, other]

doi 10.1145/3331184.3331203

Reinforcement Knowledge Graph Reasoning for Explainable Recommendation

Authors: Yikun Xian, Zuohui Fu, S. Muthukrishnan, Gerard de Melo, Yongfeng Zhang

Abstract: Recent advances in personalized recommendation have sparked great interest in the exploitation of rich structured information provided by knowledge graphs. Unlike most existing approaches that only focus on leveraging knowledge graphs for more accurate recommendation, we perform explicit reasoning with knowledge for decision making so that the recommendations are generated and supported by an inte… ▽ More Recent advances in personalized recommendation have sparked great interest in the exploitation of rich structured information provided by knowledge graphs. Unlike most existing approaches that only focus on leveraging knowledge graphs for more accurate recommendation, we perform explicit reasoning with knowledge for decision making so that the recommendations are generated and supported by an interpretable causal inference procedure. To this end, we propose a method called Policy-Guided Path Reasoning (PGPR), which couples recommendation and interpretability by providing actual paths in a knowledge graph. Our contributions include four aspects. We first highlight the significance of incorporating knowledge graphs into recommendation to formally define and interpret the reasoning process. Second, we propose a reinforcement learning (RL) approach featuring an innovative soft reward strategy, user-conditional action pruning and a multi-hop scoring function. Third, we design a policy-guided graph search algorithm to efficiently and effectively sample reasoning paths for recommendation. Finally, we extensively evaluate our method on several large-scale real-world benchmark datasets, obtaining favorable results compared with state-of-the-art methods. △ Less

Submitted 12 June, 2019; originally announced June 2019.

Comments: Accepted in SIGIR 2019

arXiv:1904.09404 [pdf, ps, other]

Waterfall Bandits: Learning to Sell Ads Online

Authors: Branislav Kveton, Saied Mahdian, S. Muthukrishnan, Zheng Wen, Yikun Xian

Abstract: A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that order. The publisher picks the order and prices to maximize her revenue. A traditional solution is to learn the demand model and then subsequently solve the optimization problem for the given demand model. This will incu… ▽ More A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that order. The publisher picks the order and prices to maximize her revenue. A traditional solution is to learn the demand model and then subsequently solve the optimization problem for the given demand model. This will incur a linear regret. We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret. We evaluate the algorithm on both synthetic and real-world data, and show that it quickly learns high quality pricing strategies. This is the first principled study of learning a waterfall design online by sequential experimentation. △ Less

Submitted 20 April, 2019; originally announced April 2019.

arXiv:1804.10488 [pdf, other]

Offline Evaluation of Ranking Policies with Click Models

Authors: Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen

Abstract: Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algor… ▽ More Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algorithms are not guaranteed to be statistically efficient in our problem because the number of recommended lists can grow exponentially with their length. To overcome this challenge, we use models of user interaction with the list of items, the so-called click models, to construct estimators that learn statistically efficiently. We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds. We evaluate our estimators in a series of experiments on a real-world dataset and show that they consistently outperform prior estimators. △ Less

Submitted 13 June, 2018; v1 submitted 27 April, 2018; originally announced April 2018.

arXiv:1712.04644 [pdf, other]

Stochastic Low-Rank Bandits

Authors: Branislav Kveton, Csaba Szepesvari, Anup Rao, Zheng Wen, Yasin Abbasi-Yadkori, S. Muthukrishnan

Abstract: Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential observations. At each step, a learning agent chooses pairs of row and column arms, and receives the noisy product of their latent values as a reward. The main challenge is that the latent values are unobs… ▽ More Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential observations. At each step, a learning agent chooses pairs of row and column arms, and receives the noisy product of their latent values as a reward. The main challenge is that the latent values are unobserved. We identify a class of non-negative matrices whose maximum entry can be found statistically efficiently and propose an algorithm for finding them, which we call LowRankElim. We derive a $\DeclareMathOperator{\poly}{poly} O((K + L) \poly(d) Δ^{-1} \log n)$ upper bound on its $n$-step regret, where $K$ is the number of rows, $L$ is the number of columns, $d$ is the rank of the matrix, and $Δ$ is the minimum gap. The bound depends on other problem-specific constants that clearly do not depend $K L$. To the best of our knowledge, this is the first such result in the literature. △ Less

Submitted 13 December, 2017; originally announced December 2017.

arXiv:1708.05159 [pdf, other]

Finding Subcube Heavy Hitters in Analytics Data Streams

Authors: Branislav Kveton, S. Muthukrishnan, Hoa T. Vu, Yikun Xian

Abstract: Data streams typically have items of large number of dimensions. We study the fundamental heavy-hitters problem in this setting. Formally, the data stream consists of $d$-dimensional items $x_1,\ldots,x_m \in [n]^d$. A $k$-dimensional subcube $T$ is a subset of distinct coordinates $\{ T_1,\cdots,T_k \} \subseteq [d]$. A subcube heavy hitter query ${\rm Query}(T,v)$, $v \in [n]^k$, outputs YES if… ▽ More Data streams typically have items of large number of dimensions. We study the fundamental heavy-hitters problem in this setting. Formally, the data stream consists of $d$-dimensional items $x_1,\ldots,x_m \in [n]^d$. A $k$-dimensional subcube $T$ is a subset of distinct coordinates $\{ T_1,\cdots,T_k \} \subseteq [d]$. A subcube heavy hitter query ${\rm Query}(T,v)$, $v \in [n]^k$, outputs YES if $f_T(v) \geq γ$ and NO if $f_T(v) < γ/4$, where $f_T$ is the ratio of number of stream items whose coordinates $T$ have joint values $v$. The all subcube heavy hitters query ${\rm AllQuery}(T)$ outputs all joint values $v$ that return YES to ${\rm Query}(T,v)$. The one dimensional version of this problem where $d=1$ was heavily studied in data stream theory, databases, networking and signal processing. The subcube heavy hitters problem is applicable in all these cases. We present a simple reservoir sampling based one-pass streaming algorithm to solve the subcube heavy hitters problem in $\tilde{O}(kd/γ)$ space. This is optimal up to poly-logarithmic factors given the established lower bound. In the worst case, this is $Θ(d^2/γ)$ which is prohibitive for large $d$, and our goal is to circumvent this quadratic bottleneck. Our main contribution is a model-based approach to the subcube heavy hitters problem. In particular, we assume that the dimensions are related to each other via the Naive Bayes model, with or without a latent dimension. Under this assumption, we present a new two-pass, $\tilde{O}(d/γ)$-space algorithm for our problem, and a fast algorithm for answering ${\rm AllQuery}(T)$ in $O(k/γ^2)$ time. Our work develops the direction of model-based data stream analysis, with much that remains to be explored. △ Less

Submitted 20 February, 2018; v1 submitted 17 August, 2017; originally announced August 2017.

Comments: To appear in WWW 2018

arXiv:1707.07334 [pdf, ps, other]

Testable Bounded Degree Graph Properties Are Random Order Streamable

Authors: Morteza Monemizadeh, S. Muthukrishnan, Pan Peng, Christian Sohler

Abstract: We study which property testing and sublinear time algorithms can be transformed into graph streaming algorithms for random order streams. Our main result is that for bounded degree graphs, any property that is constant-query testable in the adjacency list model can be tested with constant space in a single-pass in random order streams. Our result is obtained by estimating the distribution of loca… ▽ More We study which property testing and sublinear time algorithms can be transformed into graph streaming algorithms for random order streams. Our main result is that for bounded degree graphs, any property that is constant-query testable in the adjacency list model can be tested with constant space in a single-pass in random order streams. Our result is obtained by estimating the distribution of local neighborhoods of the vertices on a random order graph stream using constant space. We then show that our approach can also be applied to constant time approximation algorithms for bounded degree graphs in the adjacency list model: As an example, we obtain a constant-space single-pass random order streaming algorithms for approximating the size of a maximum matching with additive error $εn$ ($n$ is the number of nodes). Our result establishes for the first time that a large class of sublinear algorithms can be simulated in random order streams, while $Ω(n)$ space is needed for many graph streaming problems for adversarial orders. △ Less

Submitted 23 July, 2017; originally announced July 2017.

Comments: A preliminary version was presented at the 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

arXiv:1611.04825 [pdf, other]

doi 10.1145/3050220.3050239

Heavy-Hitter Detection Entirely in the Data Plane

Authors: Vibhaalakshmi Sivaraman, Srinivas Narayana, Ori Rottenstreich, S. Muthukrishnan, Jennifer Rexford

Abstract: Identifying the "heavy hitter" flows or flows with large traffic volumes in the data plane is important for several applications e.g., flow-size aware routing, DoS detection, and traffic engineering. However, measurement in the data plane is constrained by the need for line-rate processing (at 10-100Gb/s) and limited memory in switching hardware. We propose HashPipe, a heavy hitter detection algor… ▽ More Identifying the "heavy hitter" flows or flows with large traffic volumes in the data plane is important for several applications e.g., flow-size aware routing, DoS detection, and traffic engineering. However, measurement in the data plane is constrained by the need for line-rate processing (at 10-100Gb/s) and limited memory in switching hardware. We propose HashPipe, a heavy hitter detection algorithm using emerging programmable data planes. HashPipe implements a pipeline of hash tables which retain counters for heavy flows while evicting lighter flows over time. We prototype HashPipe in P4 and evaluate it with packet traces from an ISP backbone link and a data center. On the ISP trace (which contains over 400,000 flows), we find that HashPipe identifies 95% of the 300 heaviest flows with less than 80KB of memory. △ Less

Submitted 19 July, 2017; v1 submitted 15 November, 2016; originally announced November 2016.

Comments: SOSR 2017, Santa Clara, CA

arXiv:1608.03118 [pdf, ps, other]

The Sparse Awakens: Streaming Algorithms for Matching Size Estimation in Sparse Graphs

Authors: Graham Cormode, Hossein Jowhari, Morteza Monemizadeh, S. Muthukrishnan

Abstract: Estimating the size of the maximum matching is a canonical problem in graph algorithms, and one that has attracted extensive study over a range of different computational models. We present improved streaming algorithms for approximating the size of maximum matching with sparse (bounded arboricity) graphs. * Insert-Only Streams: We present a one-pass algorithm that takes O(c log^2 n) space and a… ▽ More Estimating the size of the maximum matching is a canonical problem in graph algorithms, and one that has attracted extensive study over a range of different computational models. We present improved streaming algorithms for approximating the size of maximum matching with sparse (bounded arboricity) graphs. * Insert-Only Streams: We present a one-pass algorithm that takes O(c log^2 n) space and approximates the size of the maximum matching in graphs with arboricity c within a factor of O(c). This improves significantly on the state-of-the-art O~(cn^{2/3})-space streaming algorithms. * Dynamic Streams: Given a dynamic graph stream (i.e., inserts and deletes) of edges of an underlying c-bounded arboricity graph, we present a one-pass algorithm that uses space O~(c^{10/3}n^{2/3}) and returns an O(c)-estimator for the size of the maximum matching. This algorithm improves the state-of-the-art O~(cn^{4/5})-space algorithms, where the O~(.) notation hides logarithmic in $n$ dependencies. In contrast to the previous works, our results take more advantage of the streaming access to the input and characterize the matching size based on the ordering of the edges in the stream in addition to the degree distributions and structural properties of the sparse graphs. △ Less

Submitted 14 November, 2016; v1 submitted 10 August, 2016; originally announced August 2016.

Comments: 12 pages

arXiv:1602.03105 [pdf, other]

Graphical Model Sketch

Authors: Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun

Abstract: Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In thi… ▽ More Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by "sketches", which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the cardinality of variables. Our error bounds are multiplicative and significantly improve upon those of the CM sketch, a state-of-the-art approach to estimating probabilities in streams. We evaluate our approximations on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch. △ Less

Submitted 18 July, 2016; v1 submitted 9 February, 2016; originally announced February 2016.

Comments: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases

arXiv:1409.5200 [pdf, ps, other]

The Shapley Value in Knapsack Budgeted Games

Authors: Smriti Bhagat, Anthony Kim, S. Muthukrishnan, Udi Weinsberg

Abstract: We propose the study of computing the Shapley value for a new class of cooperative games that we call budgeted games, and investigate in particular knapsack budgeted games, a version modeled after the classical knapsack problem. In these games, the "value" of a set $S$ of agents is determined only by a critical subset $T\subseteq S$ of the agents and not the entirety of $S$ due to a budget constra… ▽ More We propose the study of computing the Shapley value for a new class of cooperative games that we call budgeted games, and investigate in particular knapsack budgeted games, a version modeled after the classical knapsack problem. In these games, the "value" of a set $S$ of agents is determined only by a critical subset $T\subseteq S$ of the agents and not the entirety of $S$ due to a budget constraint that limits how large $T$ can be. We show that the Shapley value can be computed in time faster than by the naïve exponential time algorithm when there are sufficiently many agents, and also provide an algorithm that approximates the Shapley value within an additive error. For a related budgeted game associated with a greedy heuristic, we show that the Shapley value can be computed in pseudo-polynomial time. Furthermore, we generalize our proof techniques and propose what we term algorithmic representation framework that captures a broad class of cooperative games with the property of efficient computation of the Shapley value. The main idea is that the problem of determining the efficient computation can be reduced to that of finding an alternative representation of the games and an associated algorithm for computing the underlying value function with small time and space complexities in the representation size. △ Less

Submitted 18 September, 2014; originally announced September 2014.

Comments: A short version to appear in the 10th Conference on Web and Internet Economics (WINE 2014)

arXiv:1407.2220 [pdf, ps, other]

Modeling Collaboration in Academia: A Game Theoretic Approach

Authors: Graham Cormode, Qiang Ma, S. Muthukrishnan, Brian Thompson

Abstract: In this work, we aim to understand the mechanisms driving academic collaboration. We begin by building a model for how researchers split their effort between multiple papers, and how collaboration affects the number of citations a paper receives, supported by observations from a large real-world publication and citation dataset, which we call the h-Reinvestment model. Using tools from the field of… ▽ More In this work, we aim to understand the mechanisms driving academic collaboration. We begin by building a model for how researchers split their effort between multiple papers, and how collaboration affects the number of citations a paper receives, supported by observations from a large real-world publication and citation dataset, which we call the h-Reinvestment model. Using tools from the field of Game Theory, we study researchers' collaborative behavior over time under this model, with the premise that each researcher wants to maximize his or her academic success. We find analytically that there is a strong incentive to collaborate rather than work in isolation, and that studying collaborative behavior through a game-theoretic lens is a promising approach to help us better understand the nature and dynamics of academic collaboration. △ Less

Submitted 9 July, 2014; v1 submitted 8 July, 2014; originally announced July 2014.

Comments: Presented at the 1st WWW Workshop on Big Scholarly Data (2014). 6 pages, 5 figures

ACM Class: J.4

Journal ref: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web (WWW 2014), pgs 1177-1182

arXiv:1407.1121 [pdf, ps, other]

Frugal Streaming for Estimating Quantiles:One (or two) memory suffices

Authors: Qiang Ma, S. Muthukrishnan, Mark Sandler

Abstract: Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups, which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and introd… ▽ More Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups, which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and introduce frugal streaming, that is algorithms that work with tiny -- typically, sub-streaming -- amount of memory per group. We design a frugal algorithm that uses only one unit of memory per group to compute a quantile for each group. For stochastic streams where data items are drawn from a distribution independently, we analyze and show that the algorithm finds an approximation to the quantile rapidly and remains stably close to it. We also propose an extension of this algorithm that uses two units of memory per group. We show with extensive experiments with real world data from HTTP trace and Twitter that our frugal algorithms are comparable to existing streaming algorithms for estimating any quantile, but these existing algorithms use far more space per group and are unrealistic in frugal applications; further, the two memory frugal algorithm converges significantly faster than the one memory algorithm. △ Less

Submitted 4 July, 2014; originally announced July 2014.

Comments: 12 pages, 11 figures

arXiv:1407.0788 [pdf, ps, other]

Adscape: Harvesting and Analyzing Online Display Ads

Authors: Paul Barford, Igor Canadi, Darja Krushevskaja, Qiang Ma, S. Muthukrishnan

Abstract: Over the past decade, advertising has emerged as the primary source of revenue for many web sites and apps. In this paper we report a first-of-its-kind study that seeks to broadly understand the features, mechanisms and dynamics of display advertising on the web - i.e., the Adscape. Our study takes the perspective of users who are the targets of display ads shown on web sites. We develop a scalabl… ▽ More Over the past decade, advertising has emerged as the primary source of revenue for many web sites and apps. In this paper we report a first-of-its-kind study that seeks to broadly understand the features, mechanisms and dynamics of display advertising on the web - i.e., the Adscape. Our study takes the perspective of users who are the targets of display ads shown on web sites. We develop a scalable crawling capability that enables us to gather the details of display ads including creatives and landing pages. Our crawling strategy is focused on maximizing the number of unique ads harvested. Of critical importance to our study is the recognition that a user's profile (i.e. browser profile and cookies) can have a significant impact on which ads are shown. We deploy our crawler over a variety of websites and profiles and this yields over 175K distinct display ads. We find that while targeting is widely used, there remain many instances in which delivered ads do not depend on user profile; further, ads vary more over user profiles than over websites. We also assess the population of advertisers seen and identify over 3.7K distinct entities from a variety of business segments. Finally, we find that when targeting is used, the specific types of ads delivered generally correspond with the details of user profiles, and also on users' patterns of visit. △ Less

Submitted 4 July, 2014; v1 submitted 3 July, 2014; originally announced July 2014.

Comments: 11 pages, 9 figures

arXiv:1403.2941 [pdf, other]

People Like Us: Mining Scholarly Data for Comparable Researchers

Authors: Graham Cormode, S. Muthukrishnan, **yun Yan

Abstract: We present the problem of finding comparable researchers for any given researcher. This problem has many motivations. Firstly, know thyself. The answers of where we stand among research community and who we are most alike may not be easily found by existing evaluations of ones' research mainly based on citation counts. Secondly, there are many situations where one needs to find comparable research… ▽ More We present the problem of finding comparable researchers for any given researcher. This problem has many motivations. Firstly, know thyself. The answers of where we stand among research community and who we are most alike may not be easily found by existing evaluations of ones' research mainly based on citation counts. Secondly, there are many situations where one needs to find comparable researchers e.g., for reviewing peers, constructing programming committees or compiling teams for grants. It is often done through an ad hoc and informal basis. Utilizing the large scale scholarly data accessible on the web, we address the problem of automatically finding comparable researchers. We propose a standard to quantify the quality of research output, via the quality of publishing venues. We represent a researcher as a sequence of her publication records, and develop a framework of comparison of researchers by sequence matching. Several variations of comparisons are considered including matching by quality of publication venue and research topics, and performing prefix matching. We evaluate our methods on a large corpus and demonstrate the effectiveness of our methods through examples. In the end, we identify several promising directions for further work. △ Less

Submitted 6 July, 2014; v1 submitted 27 February, 2014; originally announced March 2014.

Comments: BigScholar at WWW 2014

arXiv:1312.7076 [pdf, other]

A Consensus-Focused Group Recommender System

Authors: Stratis Ioannidis, S. Muthukrishnan, **yun Yan

Abstract: In many cases, recommendations are consumed by groups of users rather than individuals. In this paper, we present a system which recommends social events to groups. The system helps groups to organize a joint activity and collectively select which activity to perform among several possible options. We also facilitate the consensus making, following the principle of group consensus decision making.… ▽ More In many cases, recommendations are consumed by groups of users rather than individuals. In this paper, we present a system which recommends social events to groups. The system helps groups to organize a joint activity and collectively select which activity to perform among several possible options. We also facilitate the consensus making, following the principle of group consensus decision making. Our system allows users to asynchronously vote, add and comment on alternatives. We observe social influence within groups through post-recommendation feedback during the group decision making process. We propose a decision cascading model and estimate such social influence, which can be used to improve the performance of group recommendation. We conduct experiments to measure the prediction performance of our model. The result shows that the model achieves better results than that of independent decision making model. △ Less

Submitted 25 March, 2014; v1 submitted 26 December, 2013; originally announced December 2013.

arXiv:1310.1968 [pdf, ps, other]

doi 10.1145/2508497.2508500

First Author Advantage: Citation Labeling in Research

Authors: Graham Cormode, S. Muthukrishnan, **yun Yan

Abstract: Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related literature, and of the work being cited. We aim to gain insight into this process by studying citation keys: user-chosen labels to identify a cited work. Our main observation is that the first listed author is dis… ▽ More Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related literature, and of the work being cited. We aim to gain insight into this process by studying citation keys: user-chosen labels to identify a cited work. Our main observation is that the first listed author is disproportionately represented in such labels, implying a strong mental bias towards the first author. △ Less

Submitted 7 October, 2013; originally announced October 2013.

Comments: Computational Scientometrics: Theory and Applications at The 22nd CIKM 2013

arXiv:1302.5724 [pdf, ps, other]

doi 10.1007/978-3-642-54423-1_62

Budget Feasible Mechanisms for Experimental Design

Authors: Thibaut Horel, Stratis Ioannidis, S. Muthukrishnan

Abstract: In the classical experimental design setting, an experimenter E has access to a population of $n$ potential experiment subjects $i\in \{1,...,n\}$, each associated with a vector of features $x_i\in R^d$. Conducting an experiment with subject $i$ reveals an unknown value $y_i\in R$ to E. E typically assumes some hypothetical relationship between $x_i$'s and $y_i$'s, e.g., $y_i \approx βx_i$, and es… ▽ More In the classical experimental design setting, an experimenter E has access to a population of $n$ potential experiment subjects $i\in \{1,...,n\}$, each associated with a vector of features $x_i\in R^d$. Conducting an experiment with subject $i$ reveals an unknown value $y_i\in R$ to E. E typically assumes some hypothetical relationship between $x_i$'s and $y_i$'s, e.g., $y_i \approx βx_i$, and estimates $β$ from experiments, e.g., through linear regression. As a proxy for various practical constraints, E may select only a subset of subjects on which to conduct the experiment. We initiate the study of budgeted mechanisms for experimental design. In this setting, E has a budget $B$. Each subject $i$ declares an associated cost $c_i >0$ to be part of the experiment, and must be paid at least her cost. In particular, the Experimental Design Problem (EDP) is to find a set $S$ of subjects for the experiment that maximizes $V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i})$ under the constraint $\sum_{i\in S}c_i\leq B$; our objective function corresponds to the information gain in parameter $β$ that is learned through linear regression methods, and is related to the so-called $D$-optimality criterion. Further, the subjects are strategic and may lie about their costs. We present a deterministic, polynomial time, budget feasible mechanism scheme, that is approximately truthful and yields a constant factor approximation to EDP. In particular, for any small $δ> 0$ and $ε> 0$, we can construct a (12.98, $ε$)-approximate mechanism that is $δ$-truthful and runs in polynomial time in both $n$ and $\log\log\frac{B}{εδ}$. We also establish that no truthful, budget-feasible algorithms is possible within a factor 2 approximation, and show how to generalize our approach to a wide class of learning problems, beyond linear regression. △ Less

Submitted 11 July, 2013; v1 submitted 22 February, 2013; originally announced February 2013.

Journal ref: LATIN 2014: Theoretical Informatics. Lecture Notes in Computer Science Volume 8392, 2014, pp 719-730

arXiv:1301.6447 [pdf, ps, other]

Nearly Optimal Private Convolution

Authors: Nadia Fawaz, S. Muthukrishnan, Aleksandar Nikolov

Abstract: We study computing the convolution of a private input $x$ with a public input $h$, while satisfying the guarantees of $(ε, δ)$-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then cap… ▽ More We study computing the convolution of a private input $x$ with a public input $h$, while satisfying the guarantees of $(ε, δ)$-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then captures important primitives including linear filtering, which is an essential tool in time series analysis, and aggregation queries on projections of the data. We give a nearly optimal algorithm for computing convolutions while satisfying $(ε, δ)$-differential privacy. Surprisingly, we follow the simple strategy of adding independent Laplacian noise to each Fourier coefficient and bounding the privacy loss using the composition theorem of Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal noise to add to each Fourier coefficient using convex programming duality. Our algorithm is very efficient -- it is essentially no more computationally expensive than a Fast Fourier Transform. To prove near optimality, we use the recent discrepancy lowerbounds of Muthukrishnan and Nikolov and derive a spectral lower bound using a characterization of discrepancy in terms of determinants. △ Less

Submitted 27 January, 2013; originally announced January 2013.

arXiv:1211.7133 [pdf]

Socializing the h-index

Authors: Graham Cormode, Qiang Ma, S. Muthukrishnan, Brian Thompson

Abstract: A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work. The h-index is a notable and widely-used example which aims to improve over simple metrics such as raw counts of papers or citations. However, a limitation of this measure is that it considers authors in isolation and does not account for contributions through a collaborative team. To addres… ▽ More A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work. The h-index is a notable and widely-used example which aims to improve over simple metrics such as raw counts of papers or citations. However, a limitation of this measure is that it considers authors in isolation and does not account for contributions through a collaborative team. To address this, we propose a natural variant that we dub the Social h-index. The idea is to redistribute the h-index score to reflect an individual's impact on the research community. In addition to describing this new measure, we provide examples, discuss its properties, and contrast with other measures. △ Less

Submitted 7 May, 2013; v1 submitted 29 November, 2012; originally announced November 2012.

Comments: 5 pages, 3 figures, 1 table

arXiv:1207.3024 [pdf, other]

A Time and Space Efficient Algorithm for Contextual Linear Bandits

Authors: José Bento, Stratis Ioannidis, S. Muthukrishnan, **yun Yan

Abstract: We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve $O(\log T)$ regret after $T$ time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with… ▽ More We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve $O(\log T)$ regret after $T$ time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with $T$ or achieve regrets that grow linearly with the number of contexts $|\myset{X}|$. We propose an $ε$-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in $\reals^d$, we prove that our algorithm has a constant computation complexity per iteration of $O(poly(d))$ and can achieve a regret of $O(poly(d) \log T)$ even when $|\myset{X}| = Ω(2^d) $. In addition, unlike previous algorithms, its space complexity scales like $O(Kd^2)$ and does not grow with $T$. △ Less

Submitted 6 July, 2014; v1 submitted 12 July, 2012; originally announced July 2012.

Comments: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2013), Prague, Czech Republic, September 23-27, 2013. Proceedings. Springer, 2013

arXiv:1205.2740 [pdf, other]

Analyses of Cardinal Auctions

Authors: Mangesh Gupte, Darja Krushevskaja, S. Muthukrishnan

Abstract: We study cardinal auctions for selling multiple copies of a good, in which bidders specify not only their bid or how much they are ready to pay for the good, but also a cardinality constraint on the number of copies that will be sold via the auction. We perform first known Price of Anarchy type analyses with detailed comparison of the classical Vickrey-Clarke-Groves (VCG) auction and one based on… ▽ More We study cardinal auctions for selling multiple copies of a good, in which bidders specify not only their bid or how much they are ready to pay for the good, but also a cardinality constraint on the number of copies that will be sold via the auction. We perform first known Price of Anarchy type analyses with detailed comparison of the classical Vickrey-Clarke-Groves (VCG) auction and one based on minimum pay property (MPP) which is similar to Generalized Second Price auction commonly used in sponsored search. Without cardinality constraints, MPP has the same efficiency (total value to bidders) and at least as much revenue (total income to the auctioneer) as VCG; this also holds for certain other generalizations of MPP (e.g., prefix constrained auctions, as we show here). In contrast, our main results are that, with cardinality constraints, (a) equilibrium efficiency of MPP is 1/2 of that of VCG and this factor is tight, and (b) in equilibrium MPP may collect as little as 1/2 the revenue of VCG. These aspects arise because in presence of cardinality constraints, more strategies are available to bidders in MPP, including bidding above their value, and this makes analyses nontrivial. △ Less

Submitted 11 May, 2012; originally announced May 2012.

arXiv:1204.0535 [pdf, other]

Doubleclick Ad Exchange Auction

Authors: Yishay Mansour, S. Muthukrishnan, Noam Nisan

Abstract: Display advertisements on the web are sold via ad exchanges that use real time auction. We describe the challenges of designing a suitable auction, and present a simple auction called the Optional Second Price (OSP) auction that is currently used in Doubleclick Ad Exchange. Display advertisements on the web are sold via ad exchanges that use real time auction. We describe the challenges of designing a suitable auction, and present a simple auction called the Optional Second Price (OSP) auction that is currently used in Doubleclick Ad Exchange. △ Less

Submitted 2 April, 2012; originally announced April 2012.

arXiv:1203.5453 [pdf, ps, other]

Optimal Private Halfspace Counting via Discrepancy

Authors: S. Muthukrishnan, Aleksandar Nikolov

Abstract: A range counting problem is specified by a set $P$ of size $|P| = n$ of points in $\mathbb{R}^d$, an integer weight $x_p$ associated to each point $p \in P$, and a range space ${\cal R} \subseteq 2^{P}$. Given a query range $R \in {\cal R}$, the target output is $R(\vec{x}) = \sum_{p \in R}{x_p}$. Range counting for different range spaces is a central problem in Computational Geometry. We study… ▽ More A range counting problem is specified by a set $P$ of size $|P| = n$ of points in $\mathbb{R}^d$, an integer weight $x_p$ associated to each point $p \in P$, and a range space ${\cal R} \subseteq 2^{P}$. Given a query range $R \in {\cal R}$, the target output is $R(\vec{x}) = \sum_{p \in R}{x_p}$. Range counting for different range spaces is a central problem in Computational Geometry. We study $(ε, δ)$-differentially private algorithms for range counting. Our main results are for the range space given by hyperplanes, that is, the halfspace counting problem. We present an $(ε, δ)$-differentially private algorithm for halfspace counting in $d$ dimensions which achieves $O(n^{1-1/d})$ average squared error. This contrasts with the $Ω(n)$ lower bound established by the classical result of Dinur and Nissim [PODS 2003] for arbitrary subset counting queries. We also show a matching lower bound on average squared error for any $(ε, δ)$-differentially private algorithm for halfspace counting. Both bounds are obtained using discrepancy theory. For the lower bound, we use a modified discrepancy measure and bound approximation of $(ε, δ)$-differentially private algorithms for range counting queries in terms of this discrepancy. We also relate the modified discrepancy measure to classical combinatorial discrepancy, which allows us to exploit known discrepancy lower bounds. This approach also yields a lower bound of $Ω((\log n)^{d-1})$ for $(ε, δ)$-differentially private orthogonal range counting in $d$ dimensions, the first known superconstant lower bound for this problem. For the upper bound, we use an approach inspired by partial coloring methods for proving discrepancy upper bounds, and obtain $(ε, δ)$-differentially private algorithms for range counting with polynomially bounded shatter function range spaces. △ Less

Submitted 24 March, 2012; originally announced March 2012.

ACM Class: F.2.0

arXiv:1202.2638 [pdf, other]

Scienceography: the study of how science is written

Authors: Graham Cormode, S. Muthukrishnan, **yun Yan

Abstract: Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas or individuals. However, most prior work has focused on extracting and analyzing citation and stylistic patterns. In this work, we introduce the notion of 'scienceography', which focuses on the writi… ▽ More Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas or individuals. However, most prior work has focused on extracting and analyzing citation and stylistic patterns. In this work, we introduce the notion of 'scienceography', which focuses on the writing of science. We provide a first large scale study using data derived from the arXiv e-print repository. Crucially, our data includes the "source code" of scientific papers-the LaTEX source-which enables us to study features not present in the "final product", such as the tools used and private comments between authors. Our study identifies broad patterns and trends in two example areas-computer science and mathematics-as well as highlighting key differences in the way that science is written in these fields. Finally, we outline future directions to extend the new topic of scienceography. △ Less

Submitted 31 July, 2013; v1 submitted 13 February, 2012; originally announced February 2012.

Comments: 13 pages,16 figures. Sixth International Conference on FUN WITH ALGORITHMS, 2012

arXiv:1110.3381 [pdf, ps, other]

Partial Data Compression and Text Indexing via Optimal Suffix Multi-Selection

Authors: Gianni Franceschini, Roberto Grossi, S. Muthukrishnan

Abstract: Consider an input text string T[1,N] drawn from an unbounded alphabet. We study partial computation in suffix-based problems for Data Compression and Text Indexing such as (I) retrieve any segment of K<=N consecutive symbols from the Burrows-Wheeler transform of T, and (II) retrieve any chunk of K<=N consecutive entries of the Suffix Array or the Suffix Tree. Prior literature would take O(N… ▽ More Consider an input text string T[1,N] drawn from an unbounded alphabet. We study partial computation in suffix-based problems for Data Compression and Text Indexing such as (I) retrieve any segment of K<=N consecutive symbols from the Burrows-Wheeler transform of T, and (II) retrieve any chunk of K<=N consecutive entries of the Suffix Array or the Suffix Tree. Prior literature would take O(N log N) comparisons (and time) to solve these problems by solving the total problem of building the entire Burrows-Wheeler transform or Text Index for T, and performing a post-processing to single out the wanted portion. We introduce a novel adaptive approach to partial computational problems above, and solve both the partial problems in O(K log K + N) comparisons and time, improving the best known running times of O(N log N) for K=o(N). These partial-computation problems are intimately related since they share a common bottleneck: the suffix multi-selection problem, which is to output the suffixes of rank r_1,r_2,...,r_K under the lexicographic order, where r_1<r_2<...<r_K, r_i in [1,N]. Special cases of this problem are well known: K=N is the suffix sorting problem that is the workhorse in Stringology with hundreds of applications, and K=1 is the recently studied suffix selection. We show that suffix multi-selection can be solved in Theta(N log N - sum_{j=0}^K Delta_j log Delta_j+N) time and comparisons, where r_0=0, r_{K+1}=N+1, and Delta_j=r_{j+1}-r_j for 0<=j<=K. This is asymptotically optimal, and also matches the bound in [Dobkin, Munro, JACM 28(3)] for multi-selection on atomic elements (not suffixes). Matching the bound known for atomic elements for strings is a long running theme and challenge from 70's, which we achieve for the suffix multi-selection problem. The partial suffix problems as well as the suffix multi-selection problem have many applications. △ Less

Submitted 15 October, 2011; originally announced October 2011.

arXiv:1108.6123 [pdf, ps, other]

doi 10.1145/2448496.2448530

Private Decayed Sum Estimation under Continual Observation

Authors: Jean Bolot, Nadia Fawaz, S. Muthukrishnan, Aleksandar Nikolov, Nina Taft

Abstract: In monitoring applications, recent data is more important than distant data. How does this affect privacy of data analysis? We study a general class of data analyses - computing predicate sums - with privacy. Formally, we study the problem of estimating predicate sums {\em privately}, for sliding windows (and other well-known decay models of data, i.e. exponential and polynomial decay). We extend… ▽ More In monitoring applications, recent data is more important than distant data. How does this affect privacy of data analysis? We study a general class of data analyses - computing predicate sums - with privacy. Formally, we study the problem of estimating predicate sums {\em privately}, for sliding windows (and other well-known decay models of data, i.e. exponential and polynomial decay). We extend the recently proposed continual privacy model of Dwork et al. We present algorithms for decayed sum which are $\eps$-differentially private, and are accurate. For window and exponential decay sums, our algorithms are accurate up to additive $1/\eps$ and polylog terms in the range of the computed function; for polynomial decay sums which are technically more challenging because partial solutions do not compose easily, our algorithms incur additional relative error. Further, we show lower bounds, tight within polylog factors and tight with respect to the dependence on the probability of error. △ Less

Submitted 2 March, 2012; v1 submitted 30 August, 2011; originally announced August 2011.

arXiv:1107.2365 [pdf, ps, other]

doi 10.1007/978-3-642-35656-8_10

On some special cases of the Entropy Photon-Number Inequality

Authors: Smarajit Das, Naresh Sharma, Siddharth Muthukrishnan

Abstract: We show that the Entropy Photon-Number Inequality (EPnI) holds where one of the input states is the vacuum state and for several candidates of the other input state that includes the cases when the state has the eigenvectors as the number states and either has only two non-zero eigenvalues or has arbitrary number of non-zero eigenvalues but is a high entropy state. We also discuss the conditions,… ▽ More We show that the Entropy Photon-Number Inequality (EPnI) holds where one of the input states is the vacuum state and for several candidates of the other input state that includes the cases when the state has the eigenvectors as the number states and either has only two non-zero eigenvalues or has arbitrary number of non-zero eigenvalues but is a high entropy state. We also discuss the conditions, which if satisfied, would lead to an extension of these results. △ Less

Submitted 12 July, 2011; originally announced July 2011.

Comments: 12 pages, no figures

Journal ref: Proceedings of Theory of Quantum Computation, Communication, and Cryptography (TQC), Tokyo, Japan, May 2012, vol. 7582, Lecture Notes in Computer Science

arXiv:1102.2551 [pdf, other]

Yield Optimization of Display Advertising with Ad Exchange

Authors: Santiago Balseiro, Jon Feldman, Vahab Mirrokni, S. Muthukrishnan

Abstract: In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and spot market ads. In this setting, the publisher should take into account the tradeoff between short-term revenue from an Ad Exchange and quality of allocating reservation ads. In this paper, we formalize… ▽ More In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and spot market ads. In this setting, the publisher should take into account the tradeoff between short-term revenue from an Ad Exchange and quality of allocating reservation ads. In this paper, we formalize this combined optimization problem as a stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange bids. We prove asymptotic optimality of this policy in terms of any trade-off between quality of delivered reservation ads and revenue from the exchange, and provide a rigorous bound for its convergence rate to the optimal policy. We also give experimental results on data derived from real publisher inventory, showing that our policy can achieve any pareto-optimal point on the quality vs. revenue curve. Finally, we study a parametric training-based algorithm in which instead of learning the dual variables from a sample data (as is done in non-parametric training-based algorithms), we learn the parameters of the distribution and construct those dual variables from the learned parameter values. We compare parametric and non-parametric ways to estimate from data both analytically and experimentally in the special case without the ad exchange, and show that though both methods converge to the optimal policy as the sample size grows, our parametric method converges faster, and thus performs better on smaller samples. △ Less

Submitted 21 September, 2012; v1 submitted 12 February, 2011; originally announced February 2011.

arXiv:1101.3291 [pdf, ps, other]

doi 10.1007/978-1-4419-8462-3_5

Node Classification in Social Networks

Authors: Smriti Bhagat, Graham Cormode, S. Muthukrishnan

Abstract: When dealing with large graphs, such as those that arise in the context of online social networks, a subset of nodes may be labeled. These labels can indicate demographic values, interest, beliefs or other characteristics of the nodes (users). A core problem is to use this information to extend the labeling so that all nodes are assigned a label (or labels). In this chapter, we survey classificati… ▽ More When dealing with large graphs, such as those that arise in the context of online social networks, a subset of nodes may be labeled. These labels can indicate demographic values, interest, beliefs or other characteristics of the nodes (users). A core problem is to use this information to extend the labeling so that all nodes are assigned a label (or labels). In this chapter, we survey classification techniques that have been proposed for this problem. We consider two broad categories: methods based on iterative application of traditional classifiers using graph information as features, and methods which propagate the existing labels via random walks. We adopt a common perspective on these methods to highlight the similarities between different approaches within and across the two categories. We also describe some extensions and related directions to the central problem of node classification. △ Less

Submitted 17 January, 2011; originally announced January 2011.

Comments: To appear in Social Network Data Analytics (Springer) Ed. Charu Aggarwal, March 2011

arXiv:1012.0412 [pdf, ps, other]

doi 10.1109/ISIT.2011.6033891

Entropy power inequality for a family of discrete random variables

Authors: Naresh Sharma, Smarajit Das, Siddharth Muthukrishnan

Abstract: It is known that the Entropy Power Inequality (EPI) always holds if the random variables have density. Not much work has been done to identify discrete distributions for which the inequality holds with the differential entropy replaced by the discrete entropy. Harremoës and Vignat showed that it holds for the pair (B(m,p), B(n,p)), m,n \in \mathbb{N}, (where B(n,p) is a Binomial distribution with… ▽ More It is known that the Entropy Power Inequality (EPI) always holds if the random variables have density. Not much work has been done to identify discrete distributions for which the inequality holds with the differential entropy replaced by the discrete entropy. Harremoës and Vignat showed that it holds for the pair (B(m,p), B(n,p)), m,n \in \mathbb{N}, (where B(n,p) is a Binomial distribution with n trials each with success probability p) for p = 0.5. In this paper, we considerably expand the set of Binomial distributions for which the inequality holds and, in particular, identify n_0(p) such that for all m,n \geq n_0(p), the EPI holds for (B(m,p), B(n,p)). We further show that the EPI holds for the discrete random variables that can be expressed as the sum of n independent identical distributed (IID) discrete random variables for large n. △ Less

Submitted 2 December, 2010; originally announced December 2010.

Comments: 18 pages, 1 figure

arXiv:1009.1544 [pdf, ps, other]

Pan-private Algorithms: When Memory Does Not Help

Authors: Darakhshan Mir, S. Muthukrishnan, Aleksandar Nikolov, Rebecca N. Wright

Abstract: Consider updates arriving online in which the $t$th input is $(i_t,d_t)$, where $i_t$'s are thought of as IDs of users. Informally, a randomized function $f$ is {\em differentially private} with respect to the IDs if the probability distribution induced by $f$ is not much different from that induced by it on an input in which occurrences of an ID $j$ are replaced with some other ID $k$ Recently, t… ▽ More Consider updates arriving online in which the $t$th input is $(i_t,d_t)$, where $i_t$'s are thought of as IDs of users. Informally, a randomized function $f$ is {\em differentially private} with respect to the IDs if the probability distribution induced by $f$ is not much different from that induced by it on an input in which occurrences of an ID $j$ are replaced with some other ID $k$ Recently, this notion was extended to {\em pan-privacy} where the computation of $f$ retains differential privacy, even if the internal memory of the algorithm is exposed to the adversary (say by a malicious break-in or by fiat by the government). This is a strong notion of privacy, and surprisingly, for basic counting tasks such as distinct counts, heavy hitters and others, Dwork et al~\cite{dwork-pan} present pan-private algorithms with reasonable accuracy. The pan-private algorithms are nontrivial, and rely on sampling. We reexamine these basic counting tasks and show improved bounds. In particular, we estimate the distinct count $\Dt$ to within $(1\pm \eps)\Dt \pm O(\polylog m)$, where $m$ is the number of elements in the universe. This uses suitably noisy statistics on sketches known in the streaming literature. We also present the first known lower bounds for pan-privacy with respect to a single intrusion. Our lower bounds show that, even if allowed to work with unbounded memory, pan-private algorithms for distinct counts can not be significantly more accurate than our algorithms. Our lower bound uses noisy decoding. For heavy hitter counts, we present a pan private streaming algorithm that is accurate to within $O(k)$ in worst case; previously known bound for this problem is arbitrarily worse. An interesting aspect of our pan-private algorithms is that, they deliberately use very small (polylogarithmic) space and tend to be streaming algorithms, even though using more space is not forbidden. △ Less

Submitted 8 September, 2010; originally announced September 2010.

Comments: 18 pages

arXiv:1008.1616 [pdf, ps, other]

Approximation Schemes for Sequential Posted Pricing in Multi-Unit Auctions

Authors: Tanmoy Chakraborty, Eyal Even-Dar, Sudipto Guha, Yishay Mansour, S. Muthukrishnan

Abstract: We design algorithms for computing approximately revenue-maximizing {\em sequential posted-pricing mechanisms (SPM)} in $K$-unit auctions, in a standard Bayesian model. A seller has $K$ copies of an item to sell, and there are $n$ buyers, each interested in only one copy, who have some value for the item. The seller must post a price for each buyer, the buyers arrive in a sequence enforced by the… ▽ More We design algorithms for computing approximately revenue-maximizing {\em sequential posted-pricing mechanisms (SPM)} in $K$-unit auctions, in a standard Bayesian model. A seller has $K$ copies of an item to sell, and there are $n$ buyers, each interested in only one copy, who have some value for the item. The seller must post a price for each buyer, the buyers arrive in a sequence enforced by the seller, and a buyer buys the item if its value exceeds the price posted to it. The seller does not know the values of the buyers, but have Bayesian information about them. An SPM specifies the ordering of buyers and the posted prices, and may be {\em adaptive} or {\em non-adaptive} in its behavior. The goal is to design SPM in polynomial time to maximize expected revenue. We compare against the expected revenue of optimal SPM, and provide a polynomial time approximation scheme (PTAS) for both non-adaptive and adaptive SPMs. This is achieved by two algorithms: an efficient algorithm that gives a $(1-\frac{1}{\sqrt{2πK}})$-approximation (and hence a PTAS for sufficiently large $K$), and another that is a PTAS for constant $K$. The first algorithm yields a non-adaptive SPM that yields its approximation guarantees against an optimal adaptive SPM -- this implies that the {\em adaptivity gap} in SPMs vanishes as $K$ becomes larger. △ Less

Submitted 9 August, 2010; originally announced August 2010.

Comments: 16 pages

ACM Class: F.2.2; J.4

arXiv:1002.3102 [pdf, other]

Selective Call Out and Real Time Bidding

Authors: Tanmoy Chakraborty, Eyal Even-Dar, Sudipto Guha, Yishay Mansour, S. Muthukrishnan

Abstract: Ads on the Internet are increasingly sold via ad exchanges such as RightMedia, AdECN and Doubleclick Ad Exchange. These exchanges allow real-time bidding, that is, each time the publisher contacts the exchange, the exchange ``calls out'' to solicit bids from ad networks. This aspect of soliciting bids introduces a novel aspect, in contrast to existing literature. This suggests develo** a joint o… ▽ More Ads on the Internet are increasingly sold via ad exchanges such as RightMedia, AdECN and Doubleclick Ad Exchange. These exchanges allow real-time bidding, that is, each time the publisher contacts the exchange, the exchange ``calls out'' to solicit bids from ad networks. This aspect of soliciting bids introduces a novel aspect, in contrast to existing literature. This suggests develo** a joint optimization framework which optimizes over the allocation and well as solicitation. We model this selective call out as an online recurrent Bayesian decision framework with bandwidth type constraints. We obtain natural algorithms with bounded performance guarantees for several natural optimization criteria. We show that these results hold under different call out constraint models, and different arrival processes. Interestingly, the paper shows that under MHR assumptions, the expected revenue of generalized second price auction with reserve is constant factor of the expected welfare. Also the analysis herein allow us prove adaptivity gap type results for the adwords problem. △ Less

Submitted 10 August, 2010; v1 submitted 16 February, 2010; originally announced February 2010.

Comments: 24 pages, 10 figures

ACM Class: F.2.2; J.4

arXiv:1001.2735 [pdf, ps, other]

doi 10.1007/s00453-012-9614-x

Stochastic Budget Optimization in Internet Advertising

Authors: Bhaskar DasGupta, S. Muthukrishnan

Abstract: Internet advertising is a sophisticated game in which the many advertisers "play" to optimize their return on investment. There are many "targets" for the advertisements, and each "target" has a collection of games with a potentially different set of players involved. In this paper, we study the problem of how advertisers allocate their budget across these "targets". In particular, we focus on for… ▽ More Internet advertising is a sophisticated game in which the many advertisers "play" to optimize their return on investment. There are many "targets" for the advertisements, and each "target" has a collection of games with a potentially different set of players involved. In this paper, we study the problem of how advertisers allocate their budget across these "targets". In particular, we focus on formulating their best response strategy as an optimization problem. Advertisers have a set of keywords ("targets") and some stochastic information about the future, namely a probability distribution over scenarios of cost vs click combinations. This summarizes the potential states of the world assuming that the strategies of other players are fixed. Then, the best response can be abstracted as stochastic budget optimization problems to figure out how to spread a given budget across these keywords to maximize the expected number of clicks. We present the first known non-trivial poly-logarithmic approximation for these problems as well as the first known hardness results of getting better than logarithmic approximation ratios in the various parameters involved. We also identify several special cases of these problems of practical interest, such as with fixed number of scenarios or with polynomial-sized parameters related to cost, which are solvable either in polynomial time or with improved approximation ratios. Stochastic budget optimization with scenarios has sophisticated technical structure. Our approximation and hardness results come from relating these problems to a special type of (0/1, bipartite) quadratic programs inherent in them. Our research answers some open problems raised by the authors in (Stochastic Models for Budget Optimization in Search-Based Advertising, Algorithmica, 58 (4), 1022-1044, 2010). △ Less

Submitted 3 February, 2013; v1 submitted 15 January, 2010; originally announced January 2010.

Comments: FINAL version

MSC Class: 68Q17; 68Q25; 91B26; 91B32 ACM Class: F.2.2; J.4

Journal ref: Algorithmica, 65 (3), 634-661, 2013

arXiv:0909.5365 [pdf, ps, other]

Quasi-Proportional Mechanisms: Prior-free Revenue Maximization

Authors: Vahab Mirrokni, S. Muthukrishnan, Uri Nadav

Abstract: Inspired by Internet ad auction applications, we study the problem of allocating a single item via an auction when bidders place very different values on the item. We formulate this as the problem of prior-free auction and focus on designing a simple mechanism that always allocates the item. Rather than designing sophisticated pricing methods like prior literature, we design better allocation me… ▽ More Inspired by Internet ad auction applications, we study the problem of allocating a single item via an auction when bidders place very different values on the item. We formulate this as the problem of prior-free auction and focus on designing a simple mechanism that always allocates the item. Rather than designing sophisticated pricing methods like prior literature, we design better allocation methods. In particular, we propose quasi-proportional allocation methods in which the probability that an item is allocated to a bidder depends (quasi-proportionally) on the bids. We prove that corresponding games for both all-pay and winners-pay quasi-proportional mechanisms admit pure Nash equilibria and this equilibrium is unique. We also give an algorithm to compute this equilibrium in polynomial time. Further, we show that the revenue of the auctioneer is promisingly high compared to the ultimate, i.e., the highest value of any of the bidders, and show bounds on the revenue of equilibria both analytically, as well as using experiments for specific quasi-proportional functions. This is the first known revenue analysis for these natural mechanisms (including the special case of proportional mechanism which is common in network resource allocation problems). △ Less

Submitted 29 September, 2009; originally announced September 2009.

arXiv:0905.4100 [pdf, ps, other]

Online Stochastic Matching: Beating 1-1/e

Authors: Jon Feldman, Aranyak Mehta, Vahab Mirrokni, S. Muthukrishnan

Abstract: We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, Vazirani and Vazirani gives an approximation ratio of $1-1/e$. In the online, stochastic case when nodes are drawn repeatedly from a known distribution, the greedy algorithm matches this approximation ratio, b… ▽ More We study the online stochastic bipartite matching problem, in a form motivated by display ad allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, Vazirani and Vazirani gives an approximation ratio of $1-1/e$. In the online, stochastic case when nodes are drawn repeatedly from a known distribution, the greedy algorithm matches this approximation ratio, but still, no algorithm is known that beats the $1 - 1/e$ bound. Our main result is a 0.67-approximation online algorithm for stochastic bipartite matching, breaking this $1 - {1/e}$ barrier. Furthermore, we show that no online algorithm can produce a $1-ε$ approximation for an arbitrarily small $ε$ for this problem. We employ a novel application of the idea of the power of two choices from load balancing: we compute two disjoint solutions to the expected instance, and use both of them in the online algorithm in a prescribed preference order. To identify these two disjoint solutions, we solve a max flow problem in a boosted flow graph, and then carefully decompose this maximum flow to two edge-disjoint (near-)matchings. These two offline solutions are used to characterize an upper bound for the optimum in any scenario. This is done by identifying a cut whose value we can bound under the arrival distribution. △ Less

Submitted 25 May, 2009; originally announced May 2009.

arXiv:0902.1737 [pdf, ps, other]

Optimal cache-aware suffix selection

Authors: Gianni Franceschini, Roberto Grossi, S. Muthukrishnan

Abstract: Given string $S[1..N]$ and integer $k$, the {\em suffix selection} problem is to determine the $k$th lexicographically smallest amongst the suffixes $S[i... N]$, $1 \leq i \leq N$. We study the suffix selection problem in the cache-aware model that captures two-level memory inherent in computing systems, for a \emph{cache} of limited size $M$ and block size $B$. The complexity of interest is the… ▽ More Given string $S[1..N]$ and integer $k$, the {\em suffix selection} problem is to determine the $k$th lexicographically smallest amongst the suffixes $S[i... N]$, $1 \leq i \leq N$. We study the suffix selection problem in the cache-aware model that captures two-level memory inherent in computing systems, for a \emph{cache} of limited size $M$ and block size $B$. The complexity of interest is the number of block transfers. We present an optimal suffix selection algorithm in the cache-aware model, requiring $\Thetah{N/B}$ block transfers, for any string $S$ over an unbounded alphabet (where characters can only be compared), under the common tall-cache assumption (i.e. $M=\Omegah{B^{1+ε}}$, where $ε<1$). Our algorithm beats the bottleneck bound for permuting an input array to the desired output array, which holds for nearly any nontrivial problem in hierarchical memory models. △ Less

Submitted 10 February, 2009; originally announced February 2009.

Journal ref: 26th International Symposium on Theoretical Aspects of Computer Science - STACS 2009 (2009) 457-468

arXiv:0901.3754 [pdf, ps, other]

Bid Optimization in Broad-Match Ad auctions

Authors: Eyal Even-dar, Yishay Mansour, Vahab Mirrokni, S. Muthukrishnan, Uri Nadav

Abstract: Ad auctions in sponsored search support ``broad match'' that allows an advertiser to target a large number of queries while bidding only on a limited number. While giving more expressiveness to advertisers, this feature makes it challenging to optimize bids to maximize their returns: choosing to bid on a query as a broad match because it provides high profit results in one bidding for related qu… ▽ More Ad auctions in sponsored search support ``broad match'' that allows an advertiser to target a large number of queries while bidding only on a limited number. While giving more expressiveness to advertisers, this feature makes it challenging to optimize bids to maximize their returns: choosing to bid on a query as a broad match because it provides high profit results in one bidding for related queries which may yield low or even negative profits. We abstract and study the complexity of the {\em bid optimization problem} which is to determine an advertiser's bids on a subset of keywords (possibly using broad match) so that her profit is maximized. In the query language model when the advertiser is allowed to bid on all queries as broad match, we present an linear programming (LP)-based polynomial-time algorithm that gets the optimal profit. In the model in which an advertiser can only bid on keywords, ie., a subset of keywords as an exact or broad match, we show that this problem is not approximable within any reasonable approximation factor unless P=NP. To deal with this hardness result, we present a constant-factor approximation when the optimal profit significantly exceeds the cost. This algorithm is based on rounding a natural LP formulation of the problem. Finally, we study a budgeted variant of the problem, and show that in the query language model, one can find two budget constrained ad campaigns in polynomial time that implement the optimal bidding strategy. Our results are the first to address bid optimization under the broad match feature which is common in ad auctions. △ Less

Submitted 23 January, 2009; originally announced January 2009.

Comments: World Wide Web Conference (WWW09), 10 pages, 2 figures

ACM Class: F.2; J.4; H.4

arXiv:0807.1297 [pdf, ps, other]

General Auction Mechanism for Search Advertising

Authors: Gagan Aggarwal, S. Muthukrishnan, David Pal, Martin Pal

Abstract: In sponsored search, a number of advertising slots is available on a search results page, and have to be allocated among a set of advertisers competing to display an ad on the page. This gives rise to a bipartite matching market that is typically cleared by the way of an automated auction. Several auction mechanisms have been proposed, with variants of the Generalized Second Price (GSP) being wi… ▽ More In sponsored search, a number of advertising slots is available on a search results page, and have to be allocated among a set of advertisers competing to display an ad on the page. This gives rise to a bipartite matching market that is typically cleared by the way of an automated auction. Several auction mechanisms have been proposed, with variants of the Generalized Second Price (GSP) being widely used in practice. A rich body of work on bipartite matching markets builds upon the stable marriage model of Gale and Shapley and the assignment model of Shapley and Shubik. We apply insights from this line of research into the structure of stable outcomes and their incentive properties to advertising auctions. We model advertising auctions in terms of an assignment model with linear utilities, extended with bidder and item specific maximum and minimum prices. Auction mechanisms like the commonly used GSP or the well-known Vickrey-Clarke-Groves (VCG) are interpreted as simply computing a \emph{bidder-optimal stable matching} in this model, for a suitably defined set of bidder preferences. In our model, the existence of a stable matching is guaranteed, and under a non-degeneracy assumption a bidder-optimal stable matching exists as well. We give an algorithm to find such matching in polynomial time, and use it to design truthful mechanism that generalizes GSP, is truthful for profit-maximizing bidders, implements features like bidder-specific minimum prices and position-specific bids, and works for rich mixtures of bidders and preferences. △ Less

Submitted 8 July, 2008; originally announced July 2008.

arXiv:0807.0222 [pdf, ps, other]

Range Medians

Authors: Sariel Har-Peled, S. Muthukrishnan

Abstract: We study a generalization of the classical median finding problem to batched query case: given an array of unsorted $n$ items and $k$ (not necessarily disjoint) intervals in the array, the goal is to determine the median in {\em each} of the intervals in the array. We give an algorithm that uses $O(n\log n + k\log k \log n)$ comparisons and show a lower bound of $Ω(n\log k)$ comparisons for this… ▽ More We study a generalization of the classical median finding problem to batched query case: given an array of unsorted $n$ items and $k$ (not necessarily disjoint) intervals in the array, the goal is to determine the median in {\em each} of the intervals in the array. We give an algorithm that uses $O(n\log n + k\log k \log n)$ comparisons and show a lower bound of $Ω(n\log k)$ comparisons for this problem. This is optimal for $k=O(n/\log n)$. △ Less

Submitted 1 July, 2008; originally announced July 2008.

Comments: To appear in ESA 08

Showing 1–50 of 63 results for author: Muthukrishnan, S