Search | arXiv e-print repository

Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds

Authors: Nikhil Bansal, Vincent Cohen-Addad, Milind Prabhu, David Saulpic, Chris Schwiegelshohn

Abstract: Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as $k$-means. Given a point set $P$, a coreset $Ω$ is a small, weighted summary that preserves the cost of all candidate solutions $S$ up to a $(1\pm \varepsilon)$ factor. For $k$-means in $d$-dimensional Euclidean space the cost for solution $S$ is $\sum_{p\in P}\min_{s\in S}\|p-s\|^2$. A ver… ▽ More Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as $k$-means. Given a point set $P$, a coreset $Ω$ is a small, weighted summary that preserves the cost of all candidate solutions $S$ up to a $(1\pm \varepsilon)$ factor. For $k$-means in $d$-dimensional Euclidean space the cost for solution $S$ is $\sum_{p\in P}\min_{s\in S}\|p-s\|^2$. A very popular method for coreset construction, both in theory and practice, is Sensitivity Sampling, where points are sampled in proportion to their importance. We show that Sensitivity Sampling yields optimal coresets of size $\tilde{O}(k/\varepsilon^2\min(\sqrt{k},\varepsilon^{-2}))$ for worst-case instances. Uniquely among all known coreset algorithms, for well-clusterable data sets with $Ω(1)$ cost stability, Sensitivity Sampling gives coresets of size $\tilde{O}(k/\varepsilon^2)$, improving over the worst-case lower bound. Notably, Sensitivity Sampling does not have to know the cost stability in order to exploit it: It is appropriately sensitive to the clusterability of the data set while being oblivious to it. We also show that any coreset for stable instances consisting of only input points must have size $Ω(k/\varepsilon^2)$. Our results for Sensitivity Sampling also extend to the $k$-median problem, and more general metric spaces. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 57 pages

arXiv:2402.17008 [pdf, other]

Benchmarking LLMs on the Semantic Overlap Summarization Task

Authors: John Salvador, Naman Bansal, Mousumi Akter, Souvika Sarkar, Anupam Das, Shubhra Kanti Karmaker

Abstract: Semantic Overlap Summarization (SOS) is a constrained multi-document summarization task, where the constraint is to capture the common/overlap** information between two alternative narratives. While recent advancements in Large Language Models (LLMs) have achieved superior performance in numerous summarization tasks, a benchmarking study of the SOS task using LLMs is yet to be performed. As LLMs… ▽ More Semantic Overlap Summarization (SOS) is a constrained multi-document summarization task, where the constraint is to capture the common/overlap** information between two alternative narratives. While recent advancements in Large Language Models (LLMs) have achieved superior performance in numerous summarization tasks, a benchmarking study of the SOS task using LLMs is yet to be performed. As LLMs' responses are sensitive to slight variations in prompt design, a major challenge in conducting such a benchmarking study is to systematically explore a variety of prompts before drawing a reliable conclusion. Fortunately, very recently, the TELeR taxonomy has been proposed which can be used to design and explore various prompts for LLMs. Using this TELeR taxonomy and 15 popular LLMs, this paper comprehensively evaluates LLMs on the SOS Task, assessing their ability to summarize overlap** information from multiple alternative narratives. For evaluation, we report well-established metrics like ROUGE, BERTscore, and SEM-F1$ on two different datasets of alternative narratives. We conclude the paper by analyzing the strengths and limitations of various LLMs in terms of their capabilities in capturing overlap** information The code and datasets used to conduct this study are available at https://anonymous.4open.science/r/llm_eval-E16D. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15589 [pdf, other]

Prompting LLMs to Compose Meta-Review Drafts from Peer-Review Narratives of Scholarly Manuscripts

Authors: Shubhra Kanti Karmaker Santu, Sanjeev Kumar Sinha, Naman Bansal, Alex Knipper, Souvika Sarkar, John Salvador, Yash Mahajan, Sri Guttikonda, Mousumi Akter, Matthew Freestone, Matthew C. Williams Jr

Abstract: One of the most important yet onerous tasks in the academic peer-reviewing process is composing meta-reviews, which involves understanding the core contributions, strengths, and weaknesses of a scholarly manuscript based on peer-review narratives from multiple experts and then summarizing those multiple experts' perspectives into a concise holistic overview. Given the latest major developments in… ▽ More One of the most important yet onerous tasks in the academic peer-reviewing process is composing meta-reviews, which involves understanding the core contributions, strengths, and weaknesses of a scholarly manuscript based on peer-review narratives from multiple experts and then summarizing those multiple experts' perspectives into a concise holistic overview. Given the latest major developments in generative AI, especially Large Language Models (LLMs), it is very compelling to rigorously study the utility of LLMs in generating such meta-reviews in an academic peer-review setting. In this paper, we perform a case study with three popular LLMs, i.e., GPT-3.5, LLaMA2, and PaLM2, to automatically generate meta-reviews by prompting them with different types/levels of prompts based on the recently proposed TELeR taxonomy. Finally, we perform a detailed qualitative study of the meta-reviews generated by the LLMs and summarize our findings and recommendations for prompting LLMs for this complex task. △ Less

Submitted 23 February, 2024; originally announced February 2024.

ACM Class: I.2.7

arXiv:2312.06902 [pdf, other]

Perseus: Removing Energy Bloat from Large Model Training

Authors: Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury

Abstract: Training large AI models on numerous GPUs consumes a massive amount of energy. We observe that not all energy consumed during training directly contributes to end-to-end training throughput, and a significant portion can be removed without slowing down training, which we call energy bloat. In this work, we identify two independent sources of energy bloat in large model training, intrinsic and ex… ▽ More Training large AI models on numerous GPUs consumes a massive amount of energy. We observe that not all energy consumed during training directly contributes to end-to-end training throughput, and a significant portion can be removed without slowing down training, which we call energy bloat. In this work, we identify two independent sources of energy bloat in large model training, intrinsic and extrinsic, and propose Perseus, a unified optimization framework that mitigates both. Perseus obtains the "iteration time-energy" Pareto frontier of any large model training job using an efficient iterative graph cut-based algorithm and schedules energy consumption of its forward and backward computations across time to remove intrinsic and extrinsic energy bloat. Evaluation on large models like GPT-3 and Bloom shows that Perseus reduces energy consumption of large model training by up to 30%, enabling savings otherwise unobtainable before. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Open-source at https://ml.energy/zeus/perseus/

arXiv:2311.15639 [pdf, other]

On Approximating Cutwidth and Pathwidth

Authors: Nikhil Bansal, Dor Katzelnick, Roy Schwartz

Abstract: We study graph ordering problems with a min-max objective. A classical problem of this type is cutwidth, where given a graph we want to order its vertices such that the number of edges crossing any point is minimized. We give a $ \log^{1+o(1)}(n)$ approximation for the problem, substantially improving upon the previous poly-logarithmic guarantees based on the standard recursive balanced partitioni… ▽ More We study graph ordering problems with a min-max objective. A classical problem of this type is cutwidth, where given a graph we want to order its vertices such that the number of edges crossing any point is minimized. We give a $ \log^{1+o(1)}(n)$ approximation for the problem, substantially improving upon the previous poly-logarithmic guarantees based on the standard recursive balanced partitioning approach of Leighton and Rao (FOCS'88). Our key idea is a new metric decomposition procedure that is suitable for handling min-max objectives, which could be of independent interest. We also use this to show other results, including an improved $ \log^{1+o(1)}(n)$ approximation for computing the pathwidth of a graph. △ Less

Submitted 12 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2309.03747 [pdf, other]

The Daunting Dilemma with Sentence Encoders: Success on Standard Benchmarks, Failure in Capturing Basic Semantic Properties

Authors: Yash Mahajan, Naman Bansal, Shubhra Kanti Karmaker

Abstract: In this paper, we adopted a retrospective approach to examine and compare five existing popular sentence encoders, i.e., Sentence-BERT, Universal Sentence Encoder (USE), LASER, InferSent, and Doc2vec, in terms of their performance on downstream tasks versus their capability to capture basic semantic properties. Initially, we evaluated all five sentence encoders on the popular SentEval benchmark an… ▽ More In this paper, we adopted a retrospective approach to examine and compare five existing popular sentence encoders, i.e., Sentence-BERT, Universal Sentence Encoder (USE), LASER, InferSent, and Doc2vec, in terms of their performance on downstream tasks versus their capability to capture basic semantic properties. Initially, we evaluated all five sentence encoders on the popular SentEval benchmark and found that multiple sentence encoders perform quite well on a variety of popular downstream tasks. However, being unable to find a single winner in all cases, we designed further experiments to gain a deeper understanding of their behavior. Specifically, we proposed four semantic evaluation criteria, i.e., Paraphrasing, Synonym Replacement, Antonym Replacement, and Sentence Jumbling, and evaluated the same five sentence encoders using these criteria. We found that the Sentence-Bert and USE models pass the paraphrasing criterion, with SBERT being the superior between the two. LASER dominates in the case of the synonym replacement criterion. Interestingly, all the sentence encoders failed the antonym replacement and jumbling criteria. These results suggest that although these popular sentence encoders perform quite well on the SentEval benchmark, they still struggle to capture some basic semantic properties, thus, posing a daunting dilemma in NLP research. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2307.13937 [pdf, ps, other]

On Minimizing Generalized Makespan on Unrelated Machines

Authors: Nikhil Ayyadevara, Nikhil Bansal, Milind Prabhu

Abstract: We consider the Generalized Makespan Problem (GMP) on unrelated machines, where we are given $n$ jobs and $m$ machines and each job $j$ has arbitrary processing time $p_{ij}$ on machine $i$. Additionally, there is a general symmetric monotone norm $ψ_i$ for each machine $i$, that determines the load on machine $i$ as a function of the sizes of jobs assigned to it. The goal is to assign the jobs to… ▽ More We consider the Generalized Makespan Problem (GMP) on unrelated machines, where we are given $n$ jobs and $m$ machines and each job $j$ has arbitrary processing time $p_{ij}$ on machine $i$. Additionally, there is a general symmetric monotone norm $ψ_i$ for each machine $i$, that determines the load on machine $i$ as a function of the sizes of jobs assigned to it. The goal is to assign the jobs to minimize the maximum machine load. Recently, Deng, Li, and Rabani (SODA'22) gave a $3$ approximation for GMP when the $ψ_i$ are top-$k$ norms, and they ask the question whether an $O(1)$ approximation exists for general norms $ψ$? We answer this negatively and show that, under natural complexity assumptions, there is some fixed constant $δ>0$, such that GMP is $Ω(\log^δ n)$ hard to approximate. We also give an $Ω(\log^{1/2} n)$ integrality gap for the natural configuration LP. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: 14 pages

arXiv:2211.05429 [pdf, other]

doi 10.1145/3503161.3547747

DrawMon: A Distributed System for Detection of Atypical Sketch Content in Concurrent Pictionary Games

Authors: Nikhil Bansal, Kartik Gupta, Kiruthika Kannan, Sivani Pentapati, Ravi Kiran Sarvadevabhatla

Abstract: Pictionary, the popular sketch-based guessing game, provides an opportunity to analyze shared goal cooperative game play in restricted communication settings. However, some players occasionally draw atypical sketch content. While such content is occasionally relevant in the game context, it sometimes represents a rule violation and impairs the game experience. To address such situations in a timel… ▽ More Pictionary, the popular sketch-based guessing game, provides an opportunity to analyze shared goal cooperative game play in restricted communication settings. However, some players occasionally draw atypical sketch content. While such content is occasionally relevant in the game context, it sometimes represents a rule violation and impairs the game experience. To address such situations in a timely and scalable manner, we introduce DrawMon, a novel distributed framework for automatic detection of atypical sketch content in concurrently occurring Pictionary game sessions. We build specialized online interfaces to collect game session data and annotate atypical sketch content, resulting in AtyPict, the first ever atypical sketch content dataset. We use AtyPict to train CanvasNet, a deep neural atypical content detection network. We utilize CanvasNet as a core component of DrawMon. Our analysis of post deployment game session data indicates DrawMon's effectiveness for scalable monitoring and atypical sketch content detection. Beyond Pictionary, our contributions also serve as a design guide for customized atypical content response systems involving shared and interactive whiteboards. Code and datasets are available at https://drawm0n.github.io. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: Presented at ACM Multimedia 2022. For project page and dataset, visit https://drawm0n.github.io

arXiv:2210.14383 [pdf, other]

CLIP-FLow: Contrastive Learning by semi-supervised Iterative Pseudo labeling for Optical Flow Estimation

Authors: Zhiqi Zhang, Nitin Bansal, Changjiang Cai, Pan Ji, Qingan Yan, Xiangyu Xu, Yi Xu

Abstract: Synthetic datasets are often used to pretrain end-to-end optical flow networks, due to the lack of a large amount of labeled, real-scene data. But major drops in accuracy occur when moving from synthetic to real scenes. How do we better transfer the knowledge learned from synthetic to real domains? To this end, we propose CLIP-FLow, a semi-supervised iterative pseudo-labeling framework to transfer… ▽ More Synthetic datasets are often used to pretrain end-to-end optical flow networks, due to the lack of a large amount of labeled, real-scene data. But major drops in accuracy occur when moving from synthetic to real scenes. How do we better transfer the knowledge learned from synthetic to real domains? To this end, we propose CLIP-FLow, a semi-supervised iterative pseudo-labeling framework to transfer the pretraining knowledge to the target real domain. We leverage large-scale, unlabeled real data to facilitate transfer learning with the supervision of iteratively updated pseudo-ground truth labels, bridging the domain gap between the synthetic and the real. In addition, we propose a contrastive flow loss on reference features and the warped features by pseudo ground truth flows, to further boost the accurate matching and dampen the mismatching due to motion, occlusion, or noisy pseudo labels. We adopt RAFT as the backbone and obtain an F1-all error of 4.11%, i.e. a 19% error reduction from RAFT (5.10%) and ranking 2$^{nd}$ place at submission on the KITTI 2015 benchmark. Our framework can also be extended to other models, e.g. CRAFT, reducing the F1-all error from 4.79% to 4.66% on KITTI 2015 benchmark. △ Less

Submitted 2 December, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

arXiv:2209.08427 [pdf, ps, other]

A Nearly Tight Lower Bound for the $d$-Dimensional Cow-Path Problem

Authors: Nikhil Bansal, John Kuszmaul, William Kuszmaul

Abstract: In the $d$-dimensional cow-path problem, a cow living in $\mathbb{R}^d$ must locate a $(d - 1)$-dimensional hyperplane $H$ whose location is unknown. The only way that the cow can find $H$ is to roam $\mathbb{R}^d$ until it intersects $\mathcal{H}$. If the cow travels a total distance $s$ to locate a hyperplane $H$ whose distance from the origin was $r \ge 1$, then the cow is said to achieve compe… ▽ More In the $d$-dimensional cow-path problem, a cow living in $\mathbb{R}^d$ must locate a $(d - 1)$-dimensional hyperplane $H$ whose location is unknown. The only way that the cow can find $H$ is to roam $\mathbb{R}^d$ until it intersects $\mathcal{H}$. If the cow travels a total distance $s$ to locate a hyperplane $H$ whose distance from the origin was $r \ge 1$, then the cow is said to achieve competitive ratio $s / r$. It is a classic result that, in $\mathbb{R}^2$, the optimal (deterministic) competitive ratio is $9$. In $\mathbb{R}^3$, the optimal competitive ratio is known to be at most $\approx 13.811$. But in higher dimensions, the asymptotic relationship between $d$ and the optimal competitive ratio remains an open question. The best upper and lower bounds, due to Antoniadis et al., are $O(d^{3/2})$ and $Ω(d)$, leaving a gap of roughly $\sqrt{d}$. In this note, we achieve a stronger lower bound of $\tildeΩ(d^{3/2})$. △ Less

Submitted 17 September, 2022; originally announced September 2022.

arXiv:2208.11286 [pdf, ps, other]

Resolving Matrix Spencer Conjecture Up to Poly-logarithmic Rank

Authors: Nikhil Bansal, Haotian Jiang, Raghu Meka

Abstract: We give a simple proof of the matrix Spencer conjecture up to poly-logarithmic rank: given symmetric $d \times d$ matrices $A_1,\ldots,A_n$ each with $\|A_i\|_{\mathsf{op}} \leq 1$ and rank at most $n/\log^3 n$, one can efficiently find $\pm 1$ signs $x_1,\ldots,x_n$ such that their signed sum has spectral norm $\|\sum_{i=1}^n x_i A_i\|_{\mathsf{op}} = O(\sqrt{n})$. This result also implies a… ▽ More We give a simple proof of the matrix Spencer conjecture up to poly-logarithmic rank: given symmetric $d \times d$ matrices $A_1,\ldots,A_n$ each with $\|A_i\|_{\mathsf{op}} \leq 1$ and rank at most $n/\log^3 n$, one can efficiently find $\pm 1$ signs $x_1,\ldots,x_n$ such that their signed sum has spectral norm $\|\sum_{i=1}^n x_i A_i\|_{\mathsf{op}} = O(\sqrt{n})$. This result also implies a $\log n - Ω( \log \log n)$ qubit lower bound for quantum random access codes encoding $n$ classical bits with advantage $\gg 1/\sqrt{n}$. Our proof uses the recent refinement of the non-commutative Khintchine inequality in [Bandeira, Boedihardjo, van Handel, 2022] for random matrices with correlated Gaussian entries. △ Less

Submitted 29 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

arXiv:2206.11848 [pdf, other]

Obj2Sub: Unsupervised Conversion of Objective to Subjective Questions

Authors: Aarish Chhabra, Nandini Bansal, Venktesh V, Mukesh Mohania, Deep Dwivedi

Abstract: Exams are conducted to test the learner's understanding of the subject. To prevent the learners from guessing or exchanging solutions, the mode of tests administered must have sufficient subjective questions that can gauge whether the learner has understood the concept by mandating a detailed answer. Hence, in this paper, we propose a novel hybrid unsupervised approach leveraging rule-based method… ▽ More Exams are conducted to test the learner's understanding of the subject. To prevent the learners from guessing or exchanging solutions, the mode of tests administered must have sufficient subjective questions that can gauge whether the learner has understood the concept by mandating a detailed answer. Hence, in this paper, we propose a novel hybrid unsupervised approach leveraging rule-based methods and pre-trained dense retrievers for the novel task of automatically converting the objective questions to subjective questions. We observe that our approach outperforms the existing data-driven approaches by 36.45% as measured by Recall@k and Precision@k. △ Less

Submitted 25 May, 2022; originally announced June 2022.

arXiv:2206.10562 [pdf, other]

Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of Semantics and Depth

Authors: Nitin Bansal, Pan Ji, Junsong Yuan, Yi Xu

Abstract: Multi-task learning (MTL) paradigm focuses on jointly learning two or more tasks, aiming for significant improvement w.r.t model's generalizability, performance, and training/inference memory footprint. The aforementioned benefits become ever so indispensable in the case of joint training for vision-related {\bf dense} prediction tasks. In this work, we tackle the MTL problem of two dense tasks, i… ▽ More Multi-task learning (MTL) paradigm focuses on jointly learning two or more tasks, aiming for significant improvement w.r.t model's generalizability, performance, and training/inference memory footprint. The aforementioned benefits become ever so indispensable in the case of joint training for vision-related {\bf dense} prediction tasks. In this work, we tackle the MTL problem of two dense tasks, i.e., semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module ({CCAM}), which facilitates effective feature sharing along each channel between the two tasks, leading to mutual performance gain with a negligible increase in trainable parameters. In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called {AffineMix}, and a simple depth augmentation using predicted semantics called {ColorAug}. Finally, we validate the performance gain of the proposed method on the Cityscapes and ScanNet dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic segmentation. △ Less

Submitted 25 October, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2205.06558 [pdf, ps, other]

Balanced Allocations: The Heavily Loaded Case with Deletions

Authors: Nikhil Bansal, William Kuszmaul

Abstract: In the 2-choice allocation problem, $m$ balls are placed into $n$ bins, and each ball must choose between two random bins $i, j \in [n]$ that it has been assigned to. It has been known for more than two decades, that if each ball follows the Greedy strategy (i.e., always pick the less-full bin), then the maximum load will be $m/n + O(\log \log n)$ with high probability in $n$ (and… ▽ More In the 2-choice allocation problem, $m$ balls are placed into $n$ bins, and each ball must choose between two random bins $i, j \in [n]$ that it has been assigned to. It has been known for more than two decades, that if each ball follows the Greedy strategy (i.e., always pick the less-full bin), then the maximum load will be $m/n + O(\log \log n)$ with high probability in $n$ (and $m / n + O(\log m)$ with high probability in $m$). It has remained open whether the same bounds hold in the dynamic version of the same game, where balls are inserted/deleted with up to $m$ balls present at a time. We show that these bounds do not hold in the dynamic setting: already on $4$ bins, there exists a sequence of insertions/deletions that cause {Greedy} to incur a maximum load of $m/4 + Ω(\sqrt{m})$ with probability $Ω(1)$ -- this is the same bound as if each ball is simply assigned to a random bin! This raises the question of whether any 2-choice allocation strategy can offer a strong bound in the dynamic setting. Our second result answers this question in the affirmative: we present a new strategy, called ModulatedGreedy, that guarantees a maximum load of $m / n + O(\log m)$, at any given moment, with high probability in $m$. Generalizing ModulatedGreedy, we obtain dynamic guarantees for the $(1 + β)$-choice setting, and for the setting of balls-and-bins on a graph. Finally, we consider a setting in which balls can be reinserted after they are deleted, and where the pair $i, j$ that a given ball uses is consistent across insertions. This seemingly small modification renders tight load balancing impossible: on 4 bins, any strategy that is oblivious to the specific identities of balls must allow for a maximum load of $m/4 + poly(m)$ at some point in the first $poly(m)$ insertions/deletions, with high probability in $m$. △ Less

Submitted 13 May, 2022; originally announced May 2022.

arXiv:2205.02930 [pdf, other]

FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras

Authors: Qingan Yan, Pan Ji, Nitin Bansal, Yuxin Ma, Yuan Tian, Yi Xu

Abstract: In this paper, we deal with the problem of monocular depth estimation for fisheye cameras in a self-supervised manner. A known issue of self-supervised depth estimation is that it suffers in low-light/over-exposure conditions and in large homogeneous regions. To tackle this issue, we propose a novel ordinal distillation loss that distills the ordinal information from a large teacher model. Such a… ▽ More In this paper, we deal with the problem of monocular depth estimation for fisheye cameras in a self-supervised manner. A known issue of self-supervised depth estimation is that it suffers in low-light/over-exposure conditions and in large homogeneous regions. To tackle this issue, we propose a novel ordinal distillation loss that distills the ordinal information from a large teacher model. Such a teacher model, since having been trained on a large amount of diverse data, can capture the depth ordering information well, but lacks in preserving accurate scene geometry. Combined with self-supervised losses, we show that our model can not only generate reasonable depth maps in challenging environments but also better recover the scene geometry. We further leverage the fisheye cameras of an AR-Glasses device to collect an indoor dataset to facilitate evaluation. △ Less

Submitted 5 May, 2022; originally announced May 2022.

arXiv:2205.01023 [pdf, ps, other]

A Unified Approach to Discrepancy Minimization

Authors: Nikhil Bansal, Aditi Laddha, Santosh S. Vempala

Abstract: We study a unified approach and algorithm for constructive discrepancy minimization based on a stochastic process. By varying the parameters of the process, one can recover various state-of-the-art results. We demonstrate the flexibility of the method by deriving a discrepancy bound for smoothed instances, which interpolates between known bounds for worst-case and random instances. We study a unified approach and algorithm for constructive discrepancy minimization based on a stochastic process. By varying the parameters of the process, one can recover various state-of-the-art results. We demonstrate the flexibility of the method by deriving a discrepancy bound for smoothed instances, which interpolates between known bounds for worst-case and random instances. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.11427 [pdf, ps, other]

Smoothed Analysis of the Komlós Conjecture

Authors: Nikhil Bansal, Haotian Jiang, Raghu Meka, Sahil Singla, Makrand Sinha

Abstract: The well-known Komlós conjecture states that given $n$ vectors in $\mathbb{R}^d$ with Euclidean norm at most one, there always exists a $\pm 1$ coloring such that the $\ell_{\infty}$ norm of the signed-sum vector is a constant independent of $n$ and $d$. We prove this conjecture in a smoothed analysis setting where the vectors are perturbed by adding a small Gaussian noise and when the number of v… ▽ More The well-known Komlós conjecture states that given $n$ vectors in $\mathbb{R}^d$ with Euclidean norm at most one, there always exists a $\pm 1$ coloring such that the $\ell_{\infty}$ norm of the signed-sum vector is a constant independent of $n$ and $d$. We prove this conjecture in a smoothed analysis setting where the vectors are perturbed by adding a small Gaussian noise and when the number of vectors $n =ω(d\log d)$. The dependence of $n$ on $d$ is the best possible even in a completely random setting. Our proof relies on a weighted second moment method, where instead of considering uniformly randomly colorings we apply the second moment method on an implicit distribution on colorings obtained by applying the Gram-Schmidt walk algorithm to a suitable set of vectors. The main technical idea is to use various properties of these colorings, including subgaussianity, to control the second moment. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: ICALP 2022

arXiv:2203.12082 [pdf, other]

PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo

Authors: Jiachen Liu, Pan Ji, Nitin Bansal, Changjiang Cai, Qingan Yan, Xiaolei Huang, Yi Xu

Abstract: We present a novel framework named PlaneMVS for 3D plane reconstruction from multiple input views with known camera poses. Most previous learning-based plane reconstruction methods reconstruct 3D planes from single images, which highly rely on single-view regression and suffer from depth scale ambiguity. In contrast, we reconstruct 3D planes with a multi-view-stereo (MVS) pipeline that takes advan… ▽ More We present a novel framework named PlaneMVS for 3D plane reconstruction from multiple input views with known camera poses. Most previous learning-based plane reconstruction methods reconstruct 3D planes from single images, which highly rely on single-view regression and suffer from depth scale ambiguity. In contrast, we reconstruct 3D planes with a multi-view-stereo (MVS) pipeline that takes advantage of multi-view geometry. We decouple plane reconstruction into a semantic plane detection branch and a plane MVS branch. The semantic plane detection branch is based on a single-view plane detection framework but with differences. The plane MVS branch adopts a set of slanted plane hypotheses to replace conventional depth hypotheses to perform plane swee** strategy and finally learns pixel-level plane parameters and its planar depth map. We present how the two branches are learned in a balanced way, and propose a soft-pooling loss to associate the outputs of the two branches and make them benefit from each other. Extensive experiments on various indoor datasets show that PlaneMVS significantly outperforms state-of-the-art (SOTA) single-view plane reconstruction methods on both plane detection and 3D geometry metrics. Our method even outperforms a set of SOTA learning-based MVS methods thanks to the learned plane priors. To the best of our knowledge, this is the first work on 3D plane reconstruction within an end-to-end MVS framework. Source code: https://github.com/oppo-us-research/PlaneMVS. △ Less

Submitted 5 June, 2024; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: CVPR 2022; source code: https://github.com/oppo-us-research/PlaneMVS

arXiv:2203.00212 [pdf, other]

Influence in Completely Bounded Block-multilinear Forms and Classical Simulation of Quantum Algorithms

Authors: Nikhil Bansal, Makrand Sinha, Ronald de Wolf

Abstract: The Aaronson-Ambainis conjecture (Theory of Computing '14) says that every low-degree bounded polynomial on the Boolean hypercube has an influential variable. This conjecture, if true, would imply that the acceptance probability of every $d$-query quantum algorithm can be well-approximated almost everywhere (i.e., on almost all inputs) by a $\mathrm{poly}(d)$-query classical algorithm. We prove a… ▽ More The Aaronson-Ambainis conjecture (Theory of Computing '14) says that every low-degree bounded polynomial on the Boolean hypercube has an influential variable. This conjecture, if true, would imply that the acceptance probability of every $d$-query quantum algorithm can be well-approximated almost everywhere (i.e., on almost all inputs) by a $\mathrm{poly}(d)$-query classical algorithm. We prove a special case of the conjecture: in every completely bounded degree-$d$ block-multilinear form with constant variance, there always exists a variable with influence at least $1/\mathrm{poly}(d)$. In a certain sense, such polynomials characterize the acceptance probability of quantum query algorithms, as shown by Arunachalam, Briët and Palazuelos (SICOMP '19). As a corollary we obtain efficient classical almost-everywhere simulation for a particular class of quantum algorithms that includes for instance $k$-fold Forrelation. Our main technical result relies on connections to free probability theory. △ Less

Submitted 28 February, 2022; originally announced March 2022.

Comments: 21 pages, 2 figures

arXiv:2202.02217 [pdf, ps, other]

Flow Time Scheduling and Prefix Beck-Fiala

Authors: Nikhil Bansal, Lars Rohwedder, Ola Svensson

Abstract: We relate discrepancy theory with the classic scheduling problems of minimizing max flow time and total flow time on unrelated machines. Specifically, we give a general reduction that allows us to transfer discrepancy bounds in the prefix Beck-Fiala (bounded $\ell_1$-norm) setting to bounds on the flow time of an optimal schedule. Combining our reduction with a deep result proved by Banaszczyk v… ▽ More We relate discrepancy theory with the classic scheduling problems of minimizing max flow time and total flow time on unrelated machines. Specifically, we give a general reduction that allows us to transfer discrepancy bounds in the prefix Beck-Fiala (bounded $\ell_1$-norm) setting to bounds on the flow time of an optimal schedule. Combining our reduction with a deep result proved by Banaszczyk via convex geometry, give guarantees of $O(\sqrt{\log n})$ and $O(\sqrt{\log n} \log P)$ for max flow time and total flow time, respectively, improving upon the previous best guarantees of $O(\log n)$ and $O(\log n \log P)$. Apart from the improved guarantees, the reduction motivates seemingly easy versions of prefix discrepancy questions: any constant bound on prefix Beck-Fiala where vectors have sparsity two (sparsity one being trivial) would already yield tight guarantees for both max flow time and total flow time. While known techniques solve this case when the entries take values in $\{-1,0,1\}$, we show that they are unlikely to transfer to the more general $2$-sparse case of bounded $\ell_1$-norm. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: An extended abstract will appear in the proceedings of STOC'22

arXiv:2201.05294 [pdf, other]

Multi-Narrative Semantic Overlap Task: Evaluation and Benchmark

Authors: Naman Bansal, Mousumi Akter, Shubhra Kanti Karmaker Santu

Abstract: In this paper, we introduce an important yet relatively unexplored NLP task called Multi-Narrative Semantic Overlap (MNSO), which entails generating a Semantic Overlap of multiple alternate narratives. As no benchmark dataset is readily available for this task, we created one by crawling 2,925 narrative pairs from the web and then, went through the tedious process of manually creating 411 differen… ▽ More In this paper, we introduce an important yet relatively unexplored NLP task called Multi-Narrative Semantic Overlap (MNSO), which entails generating a Semantic Overlap of multiple alternate narratives. As no benchmark dataset is readily available for this task, we created one by crawling 2,925 narrative pairs from the web and then, went through the tedious process of manually creating 411 different ground-truth semantic overlaps by engaging human annotators. As a way to evaluate this novel task, we first conducted a systematic study by borrowing the popular ROUGE metric from text-summarization literature and discovered that ROUGE is not suitable for our task. Subsequently, we conducted further human annotations/validations to create 200 document-level and 1,518 sentence-level ground-truth labels which helped us formulate a new precision-recall style evaluation metric, called SEM-F1 (semantic F1). Experimental results show that the proposed SEM-F1 metric yields higher correlation with human judgement as well as higher inter-rater-agreement compared to ROUGE metric. △ Less

Submitted 13 January, 2022; originally announced January 2022.

arXiv:2111.15169 [pdf, other]

Online metric allocation

Authors: Nikhil Bansal, Christian Coester

Abstract: We introduce a natural online allocation problem that connects several of the most fundamental problems in online optimization. Let $M$ be an $n$-point metric space. Consider a resource that can be allocated in arbitrary fractions to the points of $M$. At each time $t$, a convex monotone cost function $c_t: [0,1]\to\mathbb{R}_+$ appears at some point $r_t\in M$. In response, an algorithm may chang… ▽ More We introduce a natural online allocation problem that connects several of the most fundamental problems in online optimization. Let $M$ be an $n$-point metric space. Consider a resource that can be allocated in arbitrary fractions to the points of $M$. At each time $t$, a convex monotone cost function $c_t: [0,1]\to\mathbb{R}_+$ appears at some point $r_t\in M$. In response, an algorithm may change the allocation of the resource, paying movement cost as determined by the metric and service cost $c_t(x_{r_t})$, where $x_{r_t}$ is the fraction of the resource at $r_t$ at the end of time $t$. For example, when the cost functions are $c_t(x)=αx$, this is equivalent to randomized MTS, and when the cost functions are $c_t(x)=\infty\cdot 1_{x<1/k}$, this is equivalent to fractional $k$-server. We give an $O(\log n)$-competitive algorithm for weighted star metrics. Due to the generality of allowed cost functions, classical multiplicative update algorithms do not work for the metric allocation problem. A key idea of our algorithm is to decouple the rate at which a variable is updated from its value, resulting in interesting new dynamics. This can be viewed as running mirror descent with a time-varying regularizer, and we use this perspective to further refine the guarantees of our algorithm. The standard analysis techniques run into multiple complications when the regularizer is time-varying, and we show how to overcome these issues by making various modifications to the default potential function. We also consider the problem when cost functions are allowed to be non-convex. In this case, we give tight bounds of $Θ(n)$ on tree metrics, which imply deterministic and randomized competitive ratios of $O(n^2)$ and $O(n\log n)$ respectively on arbitrary metrics. Our algorithm is based on an $\ell_2^2$-regularizer. △ Less

Submitted 30 November, 2021; originally announced November 2021.

arXiv:2111.07049 [pdf, other]

Prefix Discrepancy, Smoothed Analysis, and Combinatorial Vector Balancing

Authors: Nikhil Bansal, Haotian Jiang, Raghu Meka, Sahil Singla, Makrand Sinha

Abstract: A well-known result of Banaszczyk in discrepancy theory concerns the prefix discrepancy problem (also known as the signed series problem): given a sequence of $T$ unit vectors in $\mathbb{R}^d$, find $\pm$ signs for each of them such that the signed sum vector along any prefix has a small $\ell_\infty$-norm? This problem is central to proving upper bounds for the Steinitz problem, and the popular… ▽ More A well-known result of Banaszczyk in discrepancy theory concerns the prefix discrepancy problem (also known as the signed series problem): given a sequence of $T$ unit vectors in $\mathbb{R}^d$, find $\pm$ signs for each of them such that the signed sum vector along any prefix has a small $\ell_\infty$-norm? This problem is central to proving upper bounds for the Steinitz problem, and the popular Komlós problem is a special case where one is only concerned with the final signed sum vector instead of all prefixes. Banaszczyk gave an $O(\sqrt{\log d+ \log T})$ bound for the prefix discrepancy problem. We investigate the tightness of Banaszczyk's bound and consider natural generalizations of prefix discrepancy: We first consider a smoothed analysis setting, where a small amount of additive noise perturbs the input vectors. We show an exponential improvement in $T$ compared to Banaszczyk's bound. Using a primal-dual approach and a careful chaining argument, we show that one can achieve a bound of $O(\sqrt{\log d+ \log\!\log T})$ with high probability in the smoothed setting. Moreover, this smoothed analysis bound is the best possible without further improvement on Banaszczyk's bound in the worst case. We also introduce a generalization of the prefix discrepancy problem where the discrepancy constraints correspond to paths on a DAG on $T$ vertices. We show that an analog of Banaszczyk's $O(\sqrt{\log d+ \log T})$ bound continues to hold in this setting for adversarially given unit vectors and that the $\sqrt{\log T}$ factor is unavoidable for DAGs. We also show that the dependence on $T$ cannot be improved significantly in the smoothed case for DAGs. We conclude by exploring a more general notion of vector balancing, which we call combinatorial vector balancing. We obtain near-optimal bounds in this setting, up to poly-logarithmic factors. △ Less

Submitted 13 November, 2021; originally announced November 2021.

Comments: 22 pages. Appear in ITCS 2022

arXiv:2106.06051 [pdf, other]

The Power of Two Choices in Graphical Allocation

Authors: Nikhil Bansal, Ohad Feldheim

Abstract: The graphical balls-into-bins process is a generalization of the classical 2-choice balls-into-bins process, where the bins correspond to vertices of an arbitrary underlying graph $G$. At each time step an edge of $G$ is chosen uniformly at random, and a ball must be assigned to either of the two endpoints of this edge. The standard 2-choice process corresponds to the case of $G=K_n$. For any… ▽ More The graphical balls-into-bins process is a generalization of the classical 2-choice balls-into-bins process, where the bins correspond to vertices of an arbitrary underlying graph $G$. At each time step an edge of $G$ is chosen uniformly at random, and a ball must be assigned to either of the two endpoints of this edge. The standard 2-choice process corresponds to the case of $G=K_n$. For any $k(n)$-edge-connected, $d(n)$-regular graph on $n$ vertices, and any number of balls, we give an allocation strategy that, with high probability, ensures a gap of $O((d/k) \log^4\hspace{-1pt}n \log \log n)$, between the load of any two bins. In particular, this implies polylogarithmic bounds for natural graphs such as cycles and tori, for which the classical greedy allocation strategy is conjectured to have a polynomial gap between the bin loads. For every graph $G$, we also show an $Ω((d/k) + \log n)$ lower bound on the gap achievable by any allocation strategy. This implies that our strategy achieves the optimal gap, up to polylogarithmic factors, for every graph $G$. Our allocation algorithm is simple to implement and requires only $O(\log(n))$ time per allocation. It can be viewed as a more global version of the greedy strategy that compares average load on certain fixed sets of vertices, rather than on individual vertices. A key idea is to relate the problem of designing a good allocation strategy to that of finding suitable multi-commodity flows. To this end, we consider Räcke's cut-based decomposition tree and define certain orthogonal flows on it. △ Less

Submitted 19 November, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

MSC Class: 60C05 ACM Class: F.2; G.2

arXiv:2011.09076 [pdf, other]

Learning-Augmented Weighted Paging

Authors: Nikhil Bansal, Christian Coester, Ravi Kumar, Manish Purohit, Erik Vee

Abstract: We consider a natural semi-online model for weighted paging, where at any time the algorithm is given predictions, possibly with errors, about the next arrival of each page. The model is inspired by Belady's classic optimal offline algorithm for unweighted paging, and extends the recently studied model for learning-augmented paging (Lykouris and Vassilvitskii, 2018) to the weighted setting. For… ▽ More We consider a natural semi-online model for weighted paging, where at any time the algorithm is given predictions, possibly with errors, about the next arrival of each page. The model is inspired by Belady's classic optimal offline algorithm for unweighted paging, and extends the recently studied model for learning-augmented paging (Lykouris and Vassilvitskii, 2018) to the weighted setting. For the case of perfect predictions, we provide an $\ell$-competitive deterministic and an $O(\log \ell)$-competitive randomized algorithm, where $\ell$ is the number of distinct weight classes. Both these bounds are tight, and imply an $O(\log W)$- and $O(\log \log W)$-competitive ratio, respectively, when the page weights lie between $1$ and $W$. Previously, it was not known how to use these predictions in the weighted setting and only bounds of $k$ and $O(\log k)$ were known, where $k$ is the cache size. Our results also generalize to the interleaved paging setting and to the case of imperfect predictions, with the competitive ratios degrading smoothly from $O(\ell)$ and $O(\log \ell)$ to $O(k)$ and $O(\log k)$, respectively, as the prediction error increases. Our results are based on several insights on structural properties of Belady's algorithm and the sequence of page arrival predictions, and novel potential functions that incorporate these predictions. For the case of unweighted paging, the results imply a very simple potential function based proof of the optimality of Belady's algorithm, which may be of independent interest. △ Less

Submitted 9 November, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

arXiv:2011.07097 [pdf, ps, other]

Some remarks on hypergraph matching and the Füredi-Kahn-Seymour conjecture

Authors: Nikhil Bansal, David G. Harris

Abstract: A classic conjecture of Füredi, Kahn and Seymour (1993) states that given any hypergraph with non-negative edge weights $w(e)$, there exists a matching $M$ such that $\sum_{e \in M} (|e|-1+1/|e|)\, w(e) \geq w^*$, where $w^*$ is the value of an optimum fractional matching. We show the conjecture is true for rank-3 hypergraphs, and is achieved by a natural iterated rounding algorithm. While the gen… ▽ More A classic conjecture of Füredi, Kahn and Seymour (1993) states that given any hypergraph with non-negative edge weights $w(e)$, there exists a matching $M$ such that $\sum_{e \in M} (|e|-1+1/|e|)\, w(e) \geq w^*$, where $w^*$ is the value of an optimum fractional matching. We show the conjecture is true for rank-3 hypergraphs, and is achieved by a natural iterated rounding algorithm. While the general conjecture remains open, we give several new improved bounds. In particular, we show that the iterated rounding algorithm gives $\sum_{e \in M} (|e|-δ(e))\, w(e) \geq w^*$, where $δ(e) = |e|/(|e|^2+|e|-1)$, improving upon the baseline guarantee of $\sum_{e \in M} |e|\,w(e) \geq w^*$. △ Less

Submitted 8 March, 2022; v1 submitted 13 November, 2020; originally announced November 2020.

Journal ref: Random Structures & Algorithms 62(1), pp. 52-67 (2023)

arXiv:2008.07003 [pdf, other]

$k$-Forrelation Optimally Separates Quantum and Classical Query Complexity

Authors: Nikhil Bansal, Makrand Sinha

Abstract: Aaronson and Ambainis (SICOMP `18) showed that any partial function on $N$ bits that can be computed with an advantage $δ$ over a random guess by making $q$ quantum queries, can also be computed classically with an advantage $δ/2$ by a randomized decision tree making ${O}_q(N^{1-\frac{1}{2q}}δ^{-2})$ queries. Moreover, they conjectured the $k$-Forrelation problem -- a partial function that can be… ▽ More Aaronson and Ambainis (SICOMP `18) showed that any partial function on $N$ bits that can be computed with an advantage $δ$ over a random guess by making $q$ quantum queries, can also be computed classically with an advantage $δ/2$ by a randomized decision tree making ${O}_q(N^{1-\frac{1}{2q}}δ^{-2})$ queries. Moreover, they conjectured the $k$-Forrelation problem -- a partial function that can be computed with $q = \lceil k/2 \rceil$ quantum queries -- to be a suitable candidate for exhibiting such an extremal separation. We prove their conjecture by showing a tight lower bound of $\widetildeΩ(N^{1-1/k})$ for the randomized query complexity of $k$-Forrelation, where the advantage $δ= 2^{-O(k)}$. By standard amplification arguments, this gives an explicit partial function that exhibits an $O_ε(1)$ vs $Ω(N^{1-ε})$ separation between bounded-error quantum and randomized query complexities, where $ε>0$ can be made arbitrarily small. Our proof also gives the same bound for the closely related but non-explicit $k$-Rorrelation function introduced by Tal (FOCS `20). Our techniques rely on classical Gaussian tools, in particular, Gaussian interpolation and Gaussian integration by parts, and in fact, give a more general statement. We show that to prove lower bounds for $k$-Forrelation against a family of functions, it suffices to bound the $\ell_1$-weight of the Fourier coefficients between levels $k$ and $(k-1)k$. We also prove new interpolation and integration by parts identities that might be of independent interest in the context of rounding high-dimensional Gaussian vectors. △ Less

Submitted 17 November, 2020; v1 submitted 16 August, 2020; originally announced August 2020.

Comments: 40 pages, 2 figures. Change from v1 to v2: Updated figures to fix an Adobe Acrobat specific issue. Change from v0 to v1: Improved the advantage $δ$ to $2^{-O(k)}$ strengthening the main conclusions. Added a reference to the independent work of Sherstov, Storozhenko and Wu (arxiv:2008.10223) who obtained a similar lower bound for the randomized query complexity of $k$-Rorrelation

arXiv:2007.15709 [pdf, ps, other]

An Asymptotic Lower Bound for Online Vector Bin Packing

Authors: Nikhil Bansal, Ilan Reuven Cohen

Abstract: We consider the online vector bin packing problem where $n$ items specified by $d$-dimensional vectors must be packed in the fewest number of identical $d$-dimensional bins. Azar et al. (STOC'13) showed that for any online algorithm $A$, there exist instances I, such that $A(I)$, the number of bins used by $A$ to pack $I$, is $Ω(d/\log^2 d)$ times $OPT(I)$, the minimal number of bins to pack $I$.… ▽ More We consider the online vector bin packing problem where $n$ items specified by $d$-dimensional vectors must be packed in the fewest number of identical $d$-dimensional bins. Azar et al. (STOC'13) showed that for any online algorithm $A$, there exist instances I, such that $A(I)$, the number of bins used by $A$ to pack $I$, is $Ω(d/\log^2 d)$ times $OPT(I)$, the minimal number of bins to pack $I$. However in those instances, $OPT(I)$ was only $O(\log d)$, which left open the possibility of improved algorithms with better asymptotic competitive ratio when $OPT(I) \gg d$. We rule this out by showing that for any arbitrary function $q(\cdot)$ and any randomized online algorithm $A$, there exist instances $I$ such that $ E[A(I)] \geq c\cdot d/\log^3d \cdot OPT(I) + q(d)$, for some universal constant $c$. △ Less

Submitted 4 August, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

arXiv:2007.10622 [pdf, other]

Online Discrepancy Minimization for Stochastic Arrivals

Authors: Nikhil Bansal, Haotian Jiang, Raghu Meka, Sahil Singla, Makrand Sinha

Abstract: In the stochastic online vector balancing problem, vectors $v_1,v_2,\ldots,v_T$ chosen independently from an arbitrary distribution in $\mathbb{R}^n$ arrive one-by-one and must be immediately given a $\pm$ sign. The goal is to keep the norm of the discrepancy vector, i.e., the signed prefix-sum, as small as possible for a given target norm. We consider some of the most well-known problems in dis… ▽ More In the stochastic online vector balancing problem, vectors $v_1,v_2,\ldots,v_T$ chosen independently from an arbitrary distribution in $\mathbb{R}^n$ arrive one-by-one and must be immediately given a $\pm$ sign. The goal is to keep the norm of the discrepancy vector, i.e., the signed prefix-sum, as small as possible for a given target norm. We consider some of the most well-known problems in discrepancy theory in the above online stochastic setting, and give algorithms that match the known offline bounds up to $\mathsf{polylog}(nT)$ factors. This substantially generalizes and improves upon the previous results of Bansal, Jiang, Singla, and Sinha (STOC' 20). In particular, for the Komlós problem where $\|v_t\|_2\leq 1$ for each $t$, our algorithm achieves $\tilde{O}(1)$ discrepancy with high probability, improving upon the previous $\tilde{O}(n^{3/2})$ bound. For Tusnády's problem of minimizing the discrepancy of axis-aligned boxes, we obtain an $O(\log^{d+4} T)$ bound for arbitrary distribution over points. Previous techniques only worked for product distributions and gave a weaker $O(\log^{2d+1} T)$ bound. We also consider the Banaszczyk setting, where given a symmetric convex body $K$ with Gaussian measure at least $1/2$, our algorithm achieves $\tilde{O}(1)$ discrepancy with respect to the norm given by $K$ for input distributions with sub-exponential tails. Our key idea is to introduce a potential that also enforces constraints on how the discrepancy vector evolves, allowing us to maintain certain anti-concentration properties. For the Banaszczyk setting, we further enhance this potential by combining it with ideas from generic chaining. Finally, we also extend these results to the setting of online multi-color discrepancy. △ Less

Submitted 21 July, 2020; originally announced July 2020.

arXiv:2007.09172 [pdf, ps, other]

Improved Approximations for Min Sum Vertex Cover and Generalized Min Sum Set Cover

Authors: Nikhil Bansal, Jatin Batra, Majid Farhadi, Prasad Tetali

Abstract: We study the generalized min sum set cover (GMSSC) problem, wherein given a collection of hyperedges $E$ with arbitrary covering requirements $k_e$, the goal is to find an ordering of the vertices to minimize the total cover time of the hyperedges; a hyperedge $e$ is considered covered by the first time when $k_e$ many of its vertices appear in the ordering. We give a $4.642$ approximation algorit… ▽ More We study the generalized min sum set cover (GMSSC) problem, wherein given a collection of hyperedges $E$ with arbitrary covering requirements $k_e$, the goal is to find an ordering of the vertices to minimize the total cover time of the hyperedges; a hyperedge $e$ is considered covered by the first time when $k_e$ many of its vertices appear in the ordering. We give a $4.642$ approximation algorithm for GMSSC, coming close to the best possible bound of $4$, already for the classical special case (with all $k_e=1$) of min sum set cover (MSSC) studied by Feige, Lovász and Tetali, and improving upon the previous best known bound of $12.4$ due to Im, Sviridenko and van der Zwaan. Our algorithm is based on transforming the LP solution by a suitable kernel and applying randomized rounding. This also gives an LP-based $4$ approximation for MSSC. As part of the analysis of our algorithm, we also derive an inequality on the lower tail of a sum of independent Bernoulli random variables, which might be of independent interest and broader utility. Another well-known special case is the min sum vertex cover (MSVC) problem, in which the input hypergraph is a graph and $k_e = 1$, for every edge. We give a $16/9$ approximation for MSVC, and show a matching integrality gap for the natural LP relaxation. This improves upon the previous best $1.999946$ approximation of Barenholz, Feige and Peleg. (The claimed $1.79$ approximation result of Iwata, Tetali and Tripathi for the MSVC turned out have an unfortunate, seemingly unfixable, mistake in it.) Finally, we revisit MSSC and consider the $\ell_p$ norm of cover-time of the hyperedges. Using a dual fitting argument, we show that the natural greedy algorithm achieves tight, up to NP-hardness, approximation guarantees of $(p+1)^{1+1/p}$, for all $p\ge 1$. For $p=1$, this gives yet another proof of the $4$ approximation for MSSC. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: 28 pages

arXiv:2003.08754 [pdf, other]

SAM: The Sensitivity of Attribution Methods to Hyperparameters

Authors: Naman Bansal, Chirag Agarwal, Anh Nguyen

Abstract: Attribution methods can provide powerful insights into the reasons for a classifier's decision. We argue that a key desideratum of an explanation method is its robustness to input hyperparameters which are often randomly set or empirically tuned. High sensitivity to arbitrary hyperparameter choices does not only impede reproducibility but also questions the correctness of an explanation and impair… ▽ More Attribution methods can provide powerful insights into the reasons for a classifier's decision. We argue that a key desideratum of an explanation method is its robustness to input hyperparameters which are often randomly set or empirically tuned. High sensitivity to arbitrary hyperparameter choices does not only impede reproducibility but also questions the correctness of an explanation and impairs the trust of end-users. In this paper, we provide a thorough empirical study on the sensitivity of existing attribution methods. We found an alarming trend that many methods are highly sensitive to changes in their common hyperparameters e.g. even changing a random seed can yield a different explanation! Interestingly, such sensitivity is not reflected in the average explanation accuracy scores over the dataset as commonly reported in the literature. In addition, explanations generated for robust classifiers (i.e. which are trained to be invariant to pixel-wise perturbations) are surprisingly more robust than those generated for regular classifiers. △ Less

Submitted 12 April, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

Comments: Oral paper at CVPR 2020

arXiv:1912.03350 [pdf, other]

Online Vector Balancing and Geometric Discrepancy

Authors: Nikhil Bansal, Haotian Jiang, Sahil Singla, Makrand Sinha

Abstract: We consider an online vector balancing question where $T$ vectors, chosen from an arbitrary distribution over $[-1,1]^n$, arrive one-by-one and must be immediately given a $\pm$ sign. The goal is to keep the discrepancy small as possible. A concrete example is the online interval discrepancy problem where T points are sampled uniformly in [0,1], and the goal is to immediately color them $\pm$ such… ▽ More We consider an online vector balancing question where $T$ vectors, chosen from an arbitrary distribution over $[-1,1]^n$, arrive one-by-one and must be immediately given a $\pm$ sign. The goal is to keep the discrepancy small as possible. A concrete example is the online interval discrepancy problem where T points are sampled uniformly in [0,1], and the goal is to immediately color them $\pm$ such that every sub-interval remains nearly balanced. As random coloring incurs $Ω(T^{1/2})$ discrepancy, while the offline bounds are $Θ(\sqrt{n \log (T/n)})$ for vector balancing and $1$ for interval balancing, a natural question is whether one can (nearly) match the offline bounds in the online setting for these problems. One must utilize the stochasticity as in the worst-case scenario it is known that discrepancy is $Ω(T^{1/2})$ for any online algorithm. Bansal and Spencer recently show an $O(\sqrt{n}\log T)$ bound when each coordinate is independent. When there are dependencies among the coordinates, the problem becomes much more challenging, as evidenced by a recent work of Jiang, Kulkarni, and Singla that gives a non-trivial $O(T^{1/\log\log T})$ bound for online interval discrepancy. Although this beats random coloring, it is still far from the offline bound. In this work, we introduce a new framework for online vector balancing when the input distribution has dependencies across coordinates. This lets us obtain a $poly(n, \log T)$ bound for online vector balancing under arbitrary input distributions, and a $poly(\log T)$ bound for online interval discrepancy. Our framework is powerful enough to capture other well-studied geometric discrepancy problems; e.g., a $poly(\log^d (T))$ bound for the online $d$-dimensional Tusnády's problem. A key new technical ingredient is an {anti-concentration} inequality for sums of pairwise uncorrelated random variables. △ Less

Submitted 12 April, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

Comments: Appears in STOC 2020

arXiv:1907.05473 [pdf, ps, other]

Non-uniform Geometric Set Cover and Scheduling on Multiple Machines

Authors: Nikhil Bansal, Jatin Batra

Abstract: We consider the following general scheduling problem studied recently by Moseley. There are $n$ jobs, all released at time $0$, where job $j$ has size $p_j$ and an associated arbitrary non-decreasing cost function $f_j$ of its completion time. The goal is to find a schedule on $m$ machines with minimum total cost. We give an $O(1)$ approximation for the problem, improving upon the previous… ▽ More We consider the following general scheduling problem studied recently by Moseley. There are $n$ jobs, all released at time $0$, where job $j$ has size $p_j$ and an associated arbitrary non-decreasing cost function $f_j$ of its completion time. The goal is to find a schedule on $m$ machines with minimum total cost. We give an $O(1)$ approximation for the problem, improving upon the previous $O(\log \log nP)$ bound ($P$ is the maximum to minimum size ratio), and resolving the open question of Moseley. We first note that the scheduling problem can be reduced to a clean geometric set cover problem where points on a line with arbitrary demands, must be covered by a minimum cost collection of given intervals with non-uniform capacity profiles. Unfortunately, current techniques for such problems based on knapsack cover inequalities and low union complexity, completely lose the geometric structure in the non-uniform capacity profiles and incur at least an $Ω(\log\log P)$ loss. To this end, we consider general covering problems with non-uniform capacities, and give a new method to handle capacities in a way that completely preserves their geometric structure. This allows us to use sophisticated geometric ideas in a black-box way to avoid the $Ω(\log \log P)$ loss in previous approaches. In addition to the scheduling problem above, we use this approach to obtain $O(1)$ or inverse Ackermann type bounds for several basic capacitated covering problems. △ Less

Submitted 17 July, 2020; v1 submitted 11 July, 2019; originally announced July 2019.

arXiv:1905.04610 [pdf, other]

Explainable AI for Trees: From Local Explanations to Global Understanding

Authors: Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, Su-In Lee

Abstract: Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorit… ▽ More Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains. △ Less

Submitted 11 May, 2019; originally announced May 2019.

arXiv:1905.01495 [pdf, ps, other]

New Notions and Constructions of Sparsification for Graphs and Hypergraphs

Authors: Nikhil Bansal, Ola Svensson, Luca Trevisan

Abstract: A sparsifier of a graph $G$ (Benczúr and Karger; Spielman and Teng) is a sparse weighted subgraph $\tilde G$ that approximately retains the cut structure of $G$. For general graphs, non-trivial sparsification is possible only by using weighted graphs in which different edges have different weights. Even for graphs that admit unweighted sparsifiers, there are no known polynomial time algorithms tha… ▽ More A sparsifier of a graph $G$ (Benczúr and Karger; Spielman and Teng) is a sparse weighted subgraph $\tilde G$ that approximately retains the cut structure of $G$. For general graphs, non-trivial sparsification is possible only by using weighted graphs in which different edges have different weights. Even for graphs that admit unweighted sparsifiers, there are no known polynomial time algorithms that find such unweighted sparsifiers. We study a weaker notion of sparsification suggested by Oveis Gharan, in which the number of edges in each cut $(S,\bar S)$ is not approximated within a multiplicative factor $(1+ε)$, but is, instead, approximated up to an additive term bounded by $ε$ times $d\cdot |S| + \text{vol}(S)$, where $d$ is the average degree, and $\text{vol}(S)$ is the sum of the degrees of the vertices in $S$. We provide a probabilistic polynomial time construction of such sparsifiers for every graph, and our sparsifiers have a near-optimal number of edges $O(ε^{-2} n {\rm polylog}(1/ε))$. We also provide a deterministic polynomial time construction that constructs sparsifiers with a weaker property having the optimal number of edges $O(ε^{-2} n)$. Our constructions also satisfy a spectral version of the ``additive sparsification'' property. Our construction of ``additive sparsifiers'' with $O_ε(n)$ edges also works for hypergraphs, and provides the first non-trivial notion of sparsification for hypergraphs achievable with $O(n)$ hyperedges when $ε$ and the rank $r$ of the hyperedges are constant. Finally, we provide a new construction of spectral hypergraph sparsifiers, according to the standard definition, with ${\rm poly}(ε^{-1},r)\cdot n\log n$ hyperedges, improving over the previous spectral construction (Soma and Yoshida) that used $\tilde O(n^3)$ hyperedges even for constant $r$ and $ε$. △ Less

Submitted 4 May, 2019; originally announced May 2019.

Comments: 31 pages

arXiv:1903.06898 [pdf, ps, other]

On-Line Balancing of Random Inputs

Authors: Nikhil Bansal, Joel H. Spencer

Abstract: We consider an online vector balancing game where vectors $v_t$, chosen uniformly at random in $\{-1,+1\}^n$, arrive over time and a sign $x_t \in \{-1,+1\}$ must be picked immediately upon the arrival of $v_t$. The goal is to minimize the $L^\infty$ norm of the signed sum $\sum_t x_t v_t$. We give an online strategy for picking the signs $x_t$ that has value $O(n^{1/2})$ with high probability. Up… ▽ More We consider an online vector balancing game where vectors $v_t$, chosen uniformly at random in $\{-1,+1\}^n$, arrive over time and a sign $x_t \in \{-1,+1\}$ must be picked immediately upon the arrival of $v_t$. The goal is to minimize the $L^\infty$ norm of the signed sum $\sum_t x_t v_t$. We give an online strategy for picking the signs $x_t$ that has value $O(n^{1/2})$ with high probability. Up to constants, this is the best possible even when the vectors are given in advance. △ Less

Submitted 12 July, 2020; v1 submitted 16 March, 2019; originally announced March 2019.

Comments: 13 pages

arXiv:1812.07769 [pdf, other]

Sticky Brownian Rounding and its Applications to Constraint Satisfaction Problems

Authors: Sepehr Abbasi-Zadeh, Nikhil Bansal, Guru Guruganesh, Aleksandar Nikolov, Roy Schwartz, Mohit Singh

Abstract: Semidefinite programming is a powerful tool in the design and analysis of approximation algorithms for combinatorial optimization problems. In particular, the random hyperplane rounding method of Goemans and Williamson has been extensively studied for more than two decades, resulting in various extensions to the original technique and beautiful algorithms for a wide range of applications. Despite… ▽ More Semidefinite programming is a powerful tool in the design and analysis of approximation algorithms for combinatorial optimization problems. In particular, the random hyperplane rounding method of Goemans and Williamson has been extensively studied for more than two decades, resulting in various extensions to the original technique and beautiful algorithms for a wide range of applications. Despite the fact that this approach yields tight approximation guarantees for some problems, e.g., Max-Cut, for many others, e.g., Max-SAT and Max-DiCut, the tight approximation ratio is still unknown. One of the main reasons for this is the fact that very few techniques for rounding semidefinite relaxations are known. In this work, we present a new general and simple method for rounding semi-definite programs, based on Brownian motion. Our approach is inspired by recent results in algorithmic discrepancy theory. We develop and present tools for analyzing our new rounding algorithms, utilizing mathematical machinery from the theory of Brownian motion, complex analysis, and partial differential equations. Focusing on constraint satisfaction problems, we apply our method to several classical problems, including Max-Cut, Max-2SAT, and MaxDiCut, and derive new algorithms that are competitive with the best known results. To illustrate the versatility and general applicability of our approach, we give new approximation algorithms for the Max-Cut problem with side constraints that crucially utilizes measure concentration results for the Sticky Brownian Motion, a feature missing from hyperplane rounding and its generalizations △ Less

Submitted 19 October, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

arXiv:1812.06407

Unified Graph based Multi-Cue Feature Fusion for Robust Visual Tracking

Authors: Kapil Sharma, Himanshu Ahuja, Ashish Kumar, Nipun Bansal, Gurjit Singh Walia

Abstract: Visual Tracking is a complex problem due to unconstrained appearance variations and dynamic environment. Extraction of complementary information from the object environment via multiple features and adaption to the target's appearance variations are the key problems of this work. To this end, we propose a robust object tracking framework based on Unified Graph Fusion (UGF) of multi-cue to adapt to… ▽ More Visual Tracking is a complex problem due to unconstrained appearance variations and dynamic environment. Extraction of complementary information from the object environment via multiple features and adaption to the target's appearance variations are the key problems of this work. To this end, we propose a robust object tracking framework based on Unified Graph Fusion (UGF) of multi-cue to adapt to the object's appearance. The proposed cross-diffusion of sparse and dense features not only suppresses the individual feature deficiencies but also extracts the complementary information from multi-cue. This iterative process builds robust unified features which are invariant to object deformations, fast motion, and occlusion. Robustness of the unified feature also enables the random forest classifier to precisely distinguish the foreground from the background, adding resilience to background clutter. In addition, we present a novel kernel-based adaptation strategy using outlier detection and a transductive reliability metric. △ Less

Submitted 23 May, 2019; v1 submitted 16 December, 2018; originally announced December 2018.

Comments: The information on this paper is not complete in entirety and may be misinterpreted

arXiv:1811.01597 [pdf, ps, other]

On a generalization of iterated and randomized rounding

Authors: Nikhil Bansal

Abstract: We give a general method for rounding linear programs that combines the commonly used iterated rounding and randomized rounding techniques. In particular, we show that whenever iterated rounding can be applied to a problem with some slack, there is a randomized procedure that returns an integral solution that satisfies the guarantees of iterated rounding and also has concentration properties. We u… ▽ More We give a general method for rounding linear programs that combines the commonly used iterated rounding and randomized rounding techniques. In particular, we show that whenever iterated rounding can be applied to a problem with some slack, there is a randomized procedure that returns an integral solution that satisfies the guarantees of iterated rounding and also has concentration properties. We use this to give new results for several classic problems where iterated rounding has been useful. △ Less

Submitted 18 July, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

arXiv:1810.09102 [pdf, other]

Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?

Authors: Nitin Bansal, Xiaohan Chen, Zhangyang Wang

Abstract: This paper seeks to answer the question: as the (near-) orthogonality of weights is found to be a favorable property for training deep convolutional neural networks, how can we enforce it in more effective and easy-to-use ways? We develop novel orthogonality regularizations on training deep CNNs, utilizing various advanced analytical tools such as mutual coherence and restricted isometry property.… ▽ More This paper seeks to answer the question: as the (near-) orthogonality of weights is found to be a favorable property for training deep convolutional neural networks, how can we enforce it in more effective and easy-to-use ways? We develop novel orthogonality regularizations on training deep CNNs, utilizing various advanced analytical tools such as mutual coherence and restricted isometry property. These plug-and-play regularizations can be conveniently incorporated into training almost any CNN without extra hassle. We then benchmark their effects on state-of-the-art models: ResNet, WideResNet, and ResNeXt, on several most popular computer vision datasets: CIFAR-10, CIFAR-100, SVHN and ImageNet. We observe consistent performance gains after applying those proposed regularizations, in terms of both the final accuracies achieved, and faster and more stable convergences. We have made our codes and pre-trained models publicly available: https://github.com/nbansal90/Can-we-Gain-More-from-Orthogonality. △ Less

Submitted 22 October, 2018; originally announced October 2018.

Comments: 11 pages, 1 figure, 2 tables. Accepted in NIPS 2018

arXiv:1810.03374 [pdf, ps, other]

On the discrepancy of random low degree set systems

Authors: Nikhil Bansal, Raghu Meka

Abstract: Motivated by the celebrated Beck-Fiala conjecture, we consider the random setting where there are $n$ elements and $m$ sets and each element lies in $t$ randomly chosen sets. In this setting, Ezra and Lovett showed an $O((t \log t)^{1/2})$ discrepancy bound in the regime when $n \leq m$ and an $O(1)$ bound when $n \gg m^t$. In this paper, we give a tight $O(\sqrt{t})$ bound for the entire range… ▽ More Motivated by the celebrated Beck-Fiala conjecture, we consider the random setting where there are $n$ elements and $m$ sets and each element lies in $t$ randomly chosen sets. In this setting, Ezra and Lovett showed an $O((t \log t)^{1/2})$ discrepancy bound in the regime when $n \leq m$ and an $O(1)$ bound when $n \gg m^t$. In this paper, we give a tight $O(\sqrt{t})$ bound for the entire range of $n$ and $m$, under a mild assumption that $t = Ω(\log \log m)^2$. The result is based on two steps. First, applying the partial coloring method to the case when $n = m \log^{O(1)} m$ and using the properties of the random set system we show that the overall discrepancy incurred is at most $O(\sqrt{t})$. Second, we reduce the general case to that of $n \leq m \log^{O(1)}m$ using LP duality and a careful counting argument. △ Less

Submitted 8 October, 2018; originally announced October 2018.

arXiv:1809.04355 [pdf, other]

Packing Sporadic Real-Time Tasks on Identical Multiprocessor Systems

Authors: Jian-Jia Chen, Nikhil Bansal, Samarjit Chakraborty, Georg von der Brüggen

Abstract: In real-time systems, in addition to the functional correctness recurrent tasks must fulfill timing constraints to ensure the correct behavior of the system. Partitioned scheduling is widely used in real-time systems, i.e., the tasks are statically assigned onto processors while ensuring that all timing constraints are met. The decision version of the problem, which is to check whether the deadlin… ▽ More In real-time systems, in addition to the functional correctness recurrent tasks must fulfill timing constraints to ensure the correct behavior of the system. Partitioned scheduling is widely used in real-time systems, i.e., the tasks are statically assigned onto processors while ensuring that all timing constraints are met. The decision version of the problem, which is to check whether the deadline constraints of tasks can be satisfied on a given number of identical processors, has been known ${\cal NP}$-complete in the strong sense. Several studies on this problem are based on approximations involving resource augmentation, i.e., speeding up individual processors. This paper studies another type of resource augmentation by allocating additional processors, a topic that has not been explored until recently. We provide polynomial-time algorithms and analysis, in which the approximation factors are dependent upon the input instances. Specifically, the factors are related to the maximum ratio of the period to the relative deadline of a task in the given task set. We also show that these algorithms unfortunately cannot achieve a constant approximation factor for general cases. Furthermore, we prove that the problem does not admit any asymptotic polynomial-time approximation scheme (APTAS) unless ${\cal P}={\cal NP}$ when the task set has constrained deadlines, i.e., the relative deadline of a task is no more than the period of the task. △ Less

Submitted 12 September, 2018; originally announced September 2018.

Comments: Accepted and to appear in ISAAC 2018, Yi-Lan, Taiwan

arXiv:1712.04581 [pdf, other]

Potential-Function Proofs for First-Order Methods

Authors: Nikhil Bansal, Anupam Gupta

Abstract: This note discusses proofs for convergence of first-order methods based on simple potential-function arguments. We cover methods like gradient descent (for both smooth and non-smooth settings), mirror descent, and some accelerated variants. This note discusses proofs for convergence of first-order methods based on simple potential-function arguments. We cover methods like gradient descent (for both smooth and non-smooth settings), mirror descent, and some accelerated variants. △ Less

Submitted 2 June, 2019; v1 submitted 12 December, 2017; originally announced December 2017.

arXiv:1708.03515 [pdf, other]

New Tools and Connections for Exponential-time Approximation

Authors: Nikhil Bansal, Parinya Chalermsook, Bundit Laekhanukit, Danupon Nanongkai, Jesper Nederlof

Abstract: In this paper, we develop new tools and connections for exponential time approximation. In this setting, we are given a problem instance and a parameter $α>1$, and the goal is to design an $α$-approximation algorithm with the fastest possible running time. We show the following results: - An $r$-approximation for maximum independent set in $O^*(\exp(\tilde O(n/r \log^2 r+r\log^2r)))$ time, - A… ▽ More In this paper, we develop new tools and connections for exponential time approximation. In this setting, we are given a problem instance and a parameter $α>1$, and the goal is to design an $α$-approximation algorithm with the fastest possible running time. We show the following results: - An $r$-approximation for maximum independent set in $O^*(\exp(\tilde O(n/r \log^2 r+r\log^2r)))$ time, - An $r$-approximation for chromatic number in $O^*(\exp(\tilde{O}(n/r \log r+r\log^2r)))$ time, - A $(2-1/r)$-approximation for minimum vertex cover in $O^*(\exp(n/r^{Ω(r)}))$ time, and - A $(k-1/r)$-approximation for minimum $k$-hypergraph vertex cover in $O^*(\exp(n/(kr)^{Ω(kr)}))$ time. (Throughout, $\tilde O$ and $O^*$ omit $\mathrm{polyloglog}(r)$ and factors polynomial in the input size, respectively.) The best known time bounds for all problems were $O^*(2^{n/r})$ [Bourgeois et al. 2009, 2011 & Cygan et al. 2008]. For maximum independent set and chromatic number, these bounds were complemented by $\exp(n^{1-o(1)}/r^{1+o(1)})$ lower bounds (under the Exponential Time Hypothesis (ETH)) [Chalermsook et al., 2013 & Laekhanukit, 2014 (Ph.D. Thesis)]. Our results show that the naturally-looking $O^*(2^{n/r})$ bounds are not tight for all these problems. The key to these algorithmic results is a sparsification procedure, allowing the use of better approximation algorithms for bounded degree graphs. For obtaining the first two results, we introduce a new randomized branching rule. Finally, we show a connection between PCP parameters and exponential-time approximation algorithms. This connection together with our independent set algorithm refute the possibility to overly reduce the size of Chan's PCP [Chan, 2016]. It also implies that a (significant) improvement over our result will refute the gap-ETH conjecture [Dinur 2016 & Manurangsi and Raghavendra, 2016]. △ Less

Submitted 11 August, 2017; originally announced August 2017.

Comments: 13 pages

arXiv:1708.01079 [pdf, ps, other]

The Gram-Schmidt Walk: A Cure for the Banaszczyk Blues

Authors: Nikhil Bansal, Daniel Dadush, Shashwat Garg, Shachar Lovett

Abstract: An important result in discrepancy due to Banaszczyk states that for any set of $n$ vectors in $\mathbb{R}^m$ of $\ell_2$ norm at most $1$ and any convex body $K$ in $\mathbb{R}^m$ of Gaussian measure at least half, there exists a $\pm 1$ combination of these vectors which lies in $5K$. This result implies the best known bounds for several problems in discrepancy. Banaszczyk's proof of this result… ▽ More An important result in discrepancy due to Banaszczyk states that for any set of $n$ vectors in $\mathbb{R}^m$ of $\ell_2$ norm at most $1$ and any convex body $K$ in $\mathbb{R}^m$ of Gaussian measure at least half, there exists a $\pm 1$ combination of these vectors which lies in $5K$. This result implies the best known bounds for several problems in discrepancy. Banaszczyk's proof of this result is non-constructive and a major open problem has been to give an efficient algorithm to find such a $\pm 1$ combination of the vectors. In this paper, we resolve this question and give an efficient randomized algorithm to find a $\pm 1$ combination of the vectors which lies in $cK$ for $c>0$ an absolute constant. This leads to new efficient algorithms for several problems in discrepancy theory. △ Less

Submitted 3 August, 2017; originally announced August 2017.

arXiv:1707.05527 [pdf, other]

Nested Convex Bodies are Chaseable

Authors: Nikhil Bansal, Martin Böhm, Marek Eliáš, Grigorios Koumoutsos, Seeun William Umboh

Abstract: In the Convex Body Chasing problem, we are given an initial point $v_0$ in $R^d$ and an online sequence of $n$ convex bodies $F_1, ..., F_n$. When we receive $F_i$, we are required to move inside $F_i$. Our goal is to minimize the total distance travelled. This fundamental online problem was first studied by Friedman and Linial (DCG 1993). They proved an $Ω(\sqrt{d})$ lower bound on the competitiv… ▽ More In the Convex Body Chasing problem, we are given an initial point $v_0$ in $R^d$ and an online sequence of $n$ convex bodies $F_1, ..., F_n$. When we receive $F_i$, we are required to move inside $F_i$. Our goal is to minimize the total distance travelled. This fundamental online problem was first studied by Friedman and Linial (DCG 1993). They proved an $Ω(\sqrt{d})$ lower bound on the competitive ratio, and conjectured that a competitive ratio depending only on d is possible. However, despite much interest in the problem, the conjecture remains wide open. We consider the setting in which the convex bodies are nested: $F_1 \supset ... \supset F_n$. The nested setting is closely related to extending the online LP framework of Buchbinder and Naor (ESA 2005) to arbitrary linear constraints. Moreover, this setting retains much of the difficulty of the general setting and captures an essential obstacle in resolving Friedman and Linial's conjecture. In this work, we give the first $f(d)$-competitive algorithm for chasing nested convex bodies in $R^d$. △ Less

Submitted 18 July, 2017; originally announced July 2017.

arXiv:1707.04519 [pdf, ps, other]

Competitive Algorithms for Generalized k-Server in Uniform Metrics

Authors: Nikhil Bansal, Marek Elias, Grigorios Koumoutsos, Jesper Nederlof

Abstract: The generalized k-server problem is a far-reaching extension of the k-server problem with several applications. Here, each server $s_i$ lies in its own metric space $M_i$. A request is a k-tuple $r = (r_1,r_2,\dotsc,r_k)$ and to serve it, we need to move some server $s_i$ to the point $r_i \in M_i$, and the goal is to minimize the total distance traveled by the servers. Despite much work, no f(k)-… ▽ More The generalized k-server problem is a far-reaching extension of the k-server problem with several applications. Here, each server $s_i$ lies in its own metric space $M_i$. A request is a k-tuple $r = (r_1,r_2,\dotsc,r_k)$ and to serve it, we need to move some server $s_i$ to the point $r_i \in M_i$, and the goal is to minimize the total distance traveled by the servers. Despite much work, no f(k)-competitive algorithm is known for the problem for k > 2 servers, even for special cases such as uniform metrics and lines. Here, we consider the problem in uniform metrics and give the first f(k)-competitive algorithms for general k. In particular, we obtain deterministic and randomized algorithms with competitive ratio $O(k 2^k)$ and $O(k^3 \log k)$ respectively. Our deterministic bound is based on a novel application of the polynomial method to online algorithms, and essentially matches the long-known lower bound of $2^k-1$. We also give a $2^{2^{O(k)}}$-competitive deterministic algorithm for weighted uniform metrics, which also essentially matches the recent doubly exponential lower bound for the problem. △ Less

Submitted 17 July, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

arXiv:1704.03318 [pdf, other]

Weighted k-Server Bounds via Combinatorial Dichotomies

Authors: Nikhil Bansal, Marek Elias, Grigorios Koumoutsos

Abstract: The weighted $k$-server problem is a natural generalization of the $k$-server problem where each server has a different weight. We consider the problem on uniform metrics, which corresponds to a natural generalization of paging. Our main result is a doubly exponential lower bound on the competitive ratio of any deterministic online algorithm, that essentially matches the known upper bounds for the… ▽ More The weighted $k$-server problem is a natural generalization of the $k$-server problem where each server has a different weight. We consider the problem on uniform metrics, which corresponds to a natural generalization of paging. Our main result is a doubly exponential lower bound on the competitive ratio of any deterministic online algorithm, that essentially matches the known upper bounds for the problem and closes a large and long-standing gap. The lower bound is based on relating the weighted $k$-server problem to a certain combinatorial problem and proving a Ramsey-theoretic lower bound for it. This combinatorial connection also reveals several structural properties of low cost feasible solutions to serve a sequence of requests. We use this to show that the generalized Work Function Algorithm achieves an almost optimum competitive ratio, and to obtain new refined upper bounds on the competitive ratio for the case of $d$ different weight classes. △ Less

Submitted 7 September, 2017; v1 submitted 11 April, 2017; originally announced April 2017.

Comments: accepted to FOCS'17

arXiv:1612.02788 [pdf, ps, other]

Faster Space-Efficient Algorithms for Subset Sum, k-Sum and Related Problems

Authors: Nikhil Bansal, Shashwat Garg, Jesper Nederlof, Nikhil Vyas

Abstract: We present space efficient Monte Carlo algorithms that solve Subset Sum and Knapsack instances with $n$ items using $O^*(2^{0.86n})$ time and polynomial space, where the $O^*(\cdot)$ notation suppresses factors polynomial in the input size. Both algorithms assume random read-only access to random bits. Modulo this mild assumption, this resolves a long-standing open problem in exact algorithms for… ▽ More We present space efficient Monte Carlo algorithms that solve Subset Sum and Knapsack instances with $n$ items using $O^*(2^{0.86n})$ time and polynomial space, where the $O^*(\cdot)$ notation suppresses factors polynomial in the input size. Both algorithms assume random read-only access to random bits. Modulo this mild assumption, this resolves a long-standing open problem in exact algorithms for NP-hard problems. These results can be extended to solve Binary Linear Programming on $n$ variables with few constraints in a similar running time. We also show that for any constant $k\geq 2$, random instances of $k$-Sum can be solved using $O(n^{k-0.5}polylog(n))$ time and $O(\log n)$ space, without the assumption of random access to random bits. Underlying these results is an algorithm that determines whether two given lists of length $n$ with integers bounded by a polynomial in $n$ share a common value. Assuming random read-only access to random bits, we show that this problem can be solved using $O(\log n)$ space significantly faster than the trivial $O(n^2)$ time algorithm if no value occurs too often in the same list. △ Less

Submitted 24 June, 2017; v1 submitted 8 December, 2016; originally announced December 2016.

Comments: 23 pages, 3 figures

arXiv:1611.01805 [pdf, ps, other]

Algorithmic Discrepancy Beyond Partial Coloring

Authors: Nikhil Bansal, Shashwat Garg

Abstract: The partial coloring method is one of the most powerful and widely used method in combinatorial discrepancy problems. However, in many cases it leads to sub-optimal bounds as the partial coloring step must be iterated a logarithmic number of times, and the errors can add up in an adversarial way. We give a new and general algorithmic framework that overcomes the limitations of the partial coloring… ▽ More The partial coloring method is one of the most powerful and widely used method in combinatorial discrepancy problems. However, in many cases it leads to sub-optimal bounds as the partial coloring step must be iterated a logarithmic number of times, and the errors can add up in an adversarial way. We give a new and general algorithmic framework that overcomes the limitations of the partial coloring method and can be applied in a black-box manner to various problems. Using this framework, we give new improved bounds and algorithms for several classic problems in discrepancy. In particular, for Tusnady's problem, we give an improved $O(\log^2 n)$ bound for discrepancy of axis-parallel rectangles and more generally an $O_d(\log^dn)$ bound for $d$-dimensional boxes in $\mathbb{R}^d$. Previously, even non-constructively, the best bounds were $O(\log^{2.5} n)$ and $O_d(\log^{d+0.5}n)$ respectively. Similarly, for the Steinitz problem we give the first algorithm that matches the best known non-constructive bounds due to Banaszczyk [Banaszczyk 2012] in the $\ell_\infty$ case, and improves the previous algorithmic bounds substantially in the $\ell_2$ case. Our framework is based upon a substantial generalization of the techniques developed recently in the context of the Komlós discrepancy problem [BDG16]. △ Less

Submitted 11 July, 2017; v1 submitted 6 November, 2016; originally announced November 2016.

Showing 1–50 of 72 results for author: Bansal, N