Search | arXiv e-print repository

M-SET: Multi-Drone Swarm Intelligence Experimentation with Collision Avoidance Realism

Authors: Chuhao Qin, Alexander Robins, Callum Lillywhite-Roake, Adam Pearce, Hritik Mehta, Scott James, Tsz Ho Wong, Evangelos Pournaras

Abstract: Distributed sensing by cooperative drone swarms is crucial for several Smart City applications, such as traffic monitoring and disaster response. Using an indoor lab with inexpensive drones, a testbed supports complex and ambitious studies on these systems while maintaining low cost, rigor, and external validity. This paper introduces the Multi-drone Sensing Experimentation Testbed (M-SET), a nove… ▽ More Distributed sensing by cooperative drone swarms is crucial for several Smart City applications, such as traffic monitoring and disaster response. Using an indoor lab with inexpensive drones, a testbed supports complex and ambitious studies on these systems while maintaining low cost, rigor, and external validity. This paper introduces the Multi-drone Sensing Experimentation Testbed (M-SET), a novel platform designed to prototype, develop, test, and evaluate distributed sensing with swarm intelligence. M-SET addresses the limitations of existing testbeds that fail to emulate collisions, thus lacking realism in outdoor environments. By integrating a collision avoidance method based on a potential field algorithm, M-SET ensures collision-free navigation and sensing, further optimized via a multi-agent collective learning algorithm. Extensive evaluation demonstrates accurate energy consumption estimation and a low risk of collisions, providing a robust proof-of-concept. New insights show that M-SET has significant potential to support ambitious research with minimal cost, simplicity, and high sensing quality. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 7 pages, 7 figures. This work has been submitted to the IEEE conferenece

arXiv:2405.15682 [pdf, other]

The Road Less Scheduled

Authors: Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

Abstract: Existing learning rate schedules that do not require specification of the optimization stop** step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stop** time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from c… ▽ More Existing learning rate schedules that do not require specification of the optimization stop** step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stop** time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available (https://github.com/facebookresearch/schedule_free). △ Less

Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.08879 [pdf, other]

A Diffused Background from Axion-like Particles in the Microwave Sky

Authors: Harsh Mehta, Suvodip Mukherjee

Abstract: The nature of dark matter is an unsolved cosmological problem and axions are one of the weakly interacting cold dark matter candidates. Axions or ALPs (Axion-like particles) are pseudo-scalar bosons predicted by beyond-standard model theories. The weak coupling of ALPs with photons leads to the conversion of CMB photons to ALPs in the presence of a transverse magnetic field. If they have the same… ▽ More The nature of dark matter is an unsolved cosmological problem and axions are one of the weakly interacting cold dark matter candidates. Axions or ALPs (Axion-like particles) are pseudo-scalar bosons predicted by beyond-standard model theories. The weak coupling of ALPs with photons leads to the conversion of CMB photons to ALPs in the presence of a transverse magnetic field. If they have the same mass as the effective mass of a photon in a plasma, the resonant conversion would cause a polarized spectral distortion leading to temperature fluctuations with the distortion spectrum. The probability of resonant conversion depends on the properties of the cluster such as the magnetic field, electron density, and its redshift. We show that this kind of conversion can happen in numerous unresolved galaxy clusters up to high redshifts, which will lead to a diffused polarised anisotropy signal in the microwave sky. The spectrum of the signal and its shape in the angular scale will be different from the lensed CMB polarization signal. This new polarised distortion spectrum will be correlated with the distribution of clusters in the universe and hence, with the large-scale structure. The spectrum can then be probed using its spectral and spatial variation with respect to the CMB and various foregrounds. An SNR of $\sim$ 4.36 and $\sim$ 93.87 are possible in the CMB-S4 145 GHz band and CMB-HD 150 GHz band respectively for a photon-ALPs coupling strength of $\mathrm{g_{a γ} = 10^{-12} \, GeV^{-1}}$ using galaxy clusters beyond redshift z $= 1$. The same signal would lead to additional RMS fluctuations of $\sim \mathrm{7.5 \times 10^{-2} \, μK}$ at 145 GHz. In the absence of any signal, future CMB experiments such as Simons Observatory (SO), CMB-S4, and CMB-HD can put constraints on coupling strength better than current bounds from particle physics experiment CERN Axion Solar Telescope (CAST). △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 33 pages, 20 figures, To be submitted to JCAP

arXiv:2405.08878 [pdf, other]

A power spectrum approach to search for Axion-like Particles from resolved galaxy clusters using CMB as a backlight

Authors: Harsh Mehta, Suvodip Mukherjee

Abstract: Axions or ALPs are hypothetical particles predicted by BSM theories, which make one of the dark matter candidates. These particles can convert into photons and vice-versa in the presence of magnetic field, with a probability decided by its coupling strength $\mathrm{g_{aγ}}$. One of the ways to detect these particles is using the CMB as a backlight. As the CMB photons pass through a galaxy cluster… ▽ More Axions or ALPs are hypothetical particles predicted by BSM theories, which make one of the dark matter candidates. These particles can convert into photons and vice-versa in the presence of magnetic field, with a probability decided by its coupling strength $\mathrm{g_{aγ}}$. One of the ways to detect these particles is using the CMB as a backlight. As the CMB photons pass through a galaxy cluster, they can get converted into ALPs in the mass range $10^{-15}$ eV to $10^{-11}$ eV through resonant conversion in the presence of cluster magnetic fields. This leads to a polarized spectral distortion ($α$-distortion) in the CMB as the photon polarization parallel to the magnetic field in the galaxy cluster is involved in the conversion. The fluctuations in the magnetic field and electron density in a galaxy cluster lead to spatially varying $α$-distortion around the cluster, with a power spectrum that is different from the lensed CMB polarization power spectrum for the standard model of cosmology. By measuring the difference in the polarization power spectrum around a galaxy cluster from the all-sky signal, one can find new $α$-distortion in the sky. For galaxy clusters resolvable in multiple EM bands, one can measure the coupling strength $\mathrm{g_{aγ}}$ from the ALP power spectrum. Using multi-frequency techniques like ILC to clean the foregrounds, we show that the new power spectrum-based approach of the resolved galaxy clusters from upcoming CMB experiments such as Simons Observatory and CMB-S4 can detect (or put constraints) on the ALP-photon coupling strength of $\mathrm{g_{aγ} < 5.24 \times 10^{-12} \, GeV^{-1}}$ and $\mathrm{g_{aγ} < 3.61 \times 10^{-12} \, GeV^{-1}}$ at 95\% C.I. respectively for ALPs of masses $10^{-13}$ eV or for smaller $\mathrm{g_{aγ}}$ for lighter ALP masses (Abridged). △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 31 pages, 17 figures, To be submitted to JCAP

arXiv:2404.07523 [pdf, other]

GNN-based Probabilistic Supply and Inventory Predictions in Supply Chain Networks

Authors: Hyung-il Ahn, Young Chol Song, Santiago Olivar, Hershel Mehta, Naveen Tewari

Abstract: Successful supply chain optimization must mitigate imbalances between supply and demand over time. While accurate demand prediction is essential for supply planning, it alone does not suffice. The key to successful supply planning for optimal and viable execution lies in maximizing predictability for both demand and supply throughout an execution horizon. Therefore, enhancing the accuracy of suppl… ▽ More Successful supply chain optimization must mitigate imbalances between supply and demand over time. While accurate demand prediction is essential for supply planning, it alone does not suffice. The key to successful supply planning for optimal and viable execution lies in maximizing predictability for both demand and supply throughout an execution horizon. Therefore, enhancing the accuracy of supply predictions is imperative to create an attainable supply plan that matches demand without overstocking or understocking. However, in complex supply chain networks with numerous nodes and edges, accurate supply predictions are challenging due to dynamic node interactions, cascading supply delays, resource availability, production and logistic capabilities. Consequently, supply executions often deviate from their initial plans. To address this, we present the Graph-based Supply Prediction (GSP) probabilistic model. Our attention-based graph neural network (GNN) model predicts supplies, inventory, and imbalances using graph-structured historical data, demand forecasting, and original supply plan inputs. The experiments, conducted using historical data from a global consumer goods company's large-scale supply chain, demonstrate that GSP significantly improves supply and inventory prediction accuracy, potentially offering supply plan corrections to optimize executions. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07511 [pdf]

Generative Probabilistic Planning for Optimizing Supply Chain Networks

Authors: Hyung-il Ahn, Santiago Olivar, Hershel Mehta, Young Chol Song

Abstract: Supply chain networks in enterprises are typically composed of complex topological graphs involving various types of nodes and edges, accommodating numerous products with considerable demand and supply variability. However, as supply chain networks expand in size and complexity, traditional supply chain planning methods (e.g., those found in heuristic rule-based and operations research-based syste… ▽ More Supply chain networks in enterprises are typically composed of complex topological graphs involving various types of nodes and edges, accommodating numerous products with considerable demand and supply variability. However, as supply chain networks expand in size and complexity, traditional supply chain planning methods (e.g., those found in heuristic rule-based and operations research-based systems) tend to become locally optimal or lack computational scalability, resulting in substantial imbalances between supply and demand across nodes in the network. This paper introduces a novel Generative AI technique, which we call Generative Probabilistic Planning (GPP). GPP generates dynamic supply action plans that are globally optimized across all network nodes over the time horizon for changing objectives like maximizing profits or service levels, factoring in time-varying probabilistic demand, lead time, and production conditions. GPP leverages attention-based graph neural networks (GNN), offline deep reinforcement learning (Offline RL), and policy simulations to train generative policy models and create optimal plans through probabilistic simulations, effectively accounting for various uncertainties. Our experiments using historical data from a global consumer goods company with complex supply chain networks demonstrate that GPP accomplishes objective-adaptable, probabilistically resilient, and dynamic planning for supply chain networks, leading to significant improvements in performance and profitability for enterprises. Our work plays a pivotal role in sha** the trajectory of AI adoption within the supply chain domain. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2310.07831 [pdf, other]

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

Authors: Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko

Abstract: Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works t… ▽ More Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.01258 [pdf, other]

MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

Authors: Ties van Rozendaal, Tushar Singhal, Hoang Le, Guillaume Sautiere, Amir Said, Krishna Buska, Anjuman Raha, Dimitris Kalatzis, Hitarth Mehta, Frank Mayer, Liang Zhang, Markus Nagel, Auke Wiggers

Abstract: Neural video codecs have recently become competitive with standard codecs such as HEVC in the low-delay setting. However, most neural codecs are large floating-point networks that use pixel-dense war** operations for temporal modeling, making them too computationally expensive for deployment on mobile devices. Recent work has demonstrated that running a neural decoder in real time on mobile is f… ▽ More Neural video codecs have recently become competitive with standard codecs such as HEVC in the low-delay setting. However, most neural codecs are large floating-point networks that use pixel-dense war** operations for temporal modeling, making them too computationally expensive for deployment on mobile devices. Recent work has demonstrated that running a neural decoder in real time on mobile is feasible, but shows this only for 720p RGB video. This work presents the first neural video codec that decodes 1080p YUV420 video in real time on a mobile device. Our codec relies on two major contributions. First, we design an efficient codec that uses a block-based motion compensation algorithm available on the war** core of the mobile accelerator, and we show how to quantize this model to integer precision. Second, we implement a fast decoder pipeline that concurrently runs neural network components on the neural signal processor, parallel entropy coding on the mobile GPU, and war** on the war** core. Our codec outperforms the previous on-device codec by a large margin with up to 48% BD-rate savings, while reducing the MAC count on the receiver side by $10 \times$. We perform a careful ablation to demonstrate the effect of the introduced motion compensation scheme, and ablate the effect of model quantization. △ Less

Submitted 15 November, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Matches version published at WACV 2024

arXiv:2306.00144 [pdf, other]

Mechanic: A Learning Rate Tuner

Authors: Ashok Cutkosky, Aaron Defazio, Harsh Mehta

Abstract: We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call \textsc{mechanic}. Our method provides a practical realization of recent theoretical reductions for accomplishing a similar goal in online convex optimization. We rigorously evaluate \textsc{mechanic} on a range of large scale deep learning tasks with vary… ▽ More We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call \textsc{mechanic}. Our method provides a practical realization of recent theoretical reductions for accomplishing a similar goal in online convex optimization. We rigorously evaluate \textsc{mechanic} on a range of large scale deep learning tasks with varying batch sizes, schedules, and base optimization algorithms. These experiments demonstrate that depending on the problem, \textsc{mechanic} either comes very close to, matches or even improves upon manual tuning of learning rates. △ Less

Submitted 1 June, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

arXiv:2302.03775 [pdf, ps, other]

Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion

Authors: Ashok Cutkosky, Harsh Mehta, Francesco Orabona

Abstract: We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique. This improves the current best-known complexity for finding a $(δ,ε)$-stationary point from $O(ε^{-4}δ^{-1})$ stochastic gradient queries to $O(ε^{-3}δ^{-1})$, which we also show to be optimal. Our primary technique is a reduction from non-smooth non-convex optimization to onl… ▽ More We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique. This improves the current best-known complexity for finding a $(δ,ε)$-stationary point from $O(ε^{-4}δ^{-1})$ stochastic gradient queries to $O(ε^{-3}δ^{-1})$, which we also show to be optimal. Our primary technique is a reduction from non-smooth non-convex optimization to online learning, after which our results follow from standard regret bounds in online learning. For deterministic and second-order smooth objectives, applying more advanced optimistic online learning techniques enables a new complexity of $O(ε^{-1.5}δ^{-0.5})$. Our techniques also recover all optimal or best-known results for finding $ε$ stationary points of smooth or second-order smooth objectives in both stochastic and deterministic settings. △ Less

Submitted 11 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

arXiv:2212.00768 [pdf, other]

Simplifying and Understanding State Space Models with Diagonal Linear RNNs

Authors: Ankit Gupta, Harsh Mehta, Jonathan Berant

Abstract: Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla D… ▽ More Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla Diagonal Linear RNNs ($\mathrm{DLR}$). We empirically show that, despite being conceptually much simpler, $\mathrm{DLR}$ is as performant as previously-proposed SSMs on a variety of tasks and benchmarks including Long Range Arena and raw speech classification. Moreover, we characterize the expressivity of SSMs (including $\mathrm{DLR}$) and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks involving interactions over tens of thousands of tokens, ranging from simple operations, such as shifting an input sequence, to detecting co-dependent visual features over long spatial ranges in flattened images. We find that while SSMs report near-perfect performance on tasks that can be modeled via $\textit{few}$ convolutional kernels, they struggle on tasks requiring $\textit{many}$ such kernels and especially when the desired sequence manipulation is $\textit{context-dependent}$. Despite these limitations, $\mathrm{DLR}$ reaches high performance on two higher-order reasoning tasks $\mathrm{ListOpsSubTrees}$ and $\mathrm{PathfinderSegmentation}\text{-}\mathrm{256}$ with input lengths $8K$ and $65K$ respectively, and gives encouraging performance on $\mathrm{PathfinderSegmentation}\text{-}\mathrm{512}$ with input length $262K$ for which attention is not a viable choice. △ Less

Submitted 14 November, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: added Long Range Arena, language modeling with mixture of experts

arXiv:2211.13403 [pdf, other]

Differentially Private Image Classification from Features

Authors: Harsh Mehta, Walid Krichene, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

Abstract: Leveraging transfer learning has recently been shown to be an effective strategy for training large models with Differential Privacy (DP). Moreover, somewhat surprisingly, recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies largely rely on algorithms like DP-SGD for training large models, in the specific c… ▽ More Leveraging transfer learning has recently been shown to be an effective strategy for training large models with Differential Privacy (DP). Moreover, somewhat surprisingly, recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies largely rely on algorithms like DP-SGD for training large models, in the specific case of privately learning from features, we observe that computational burden is low enough to allow for more sophisticated optimization schemes, including second-order methods. To that end, we systematically explore the effect of design parameters such as loss function and optimization algorithm. We find that, while commonly used logistic regression performs better than linear regression in the non-private setting, the situation is reversed in the private setting. We find that linear regression is much more effective than logistic regression from both privacy and computational aspects, especially at stricter epsilon values ($ε< 1$). On the optimization side, we also explore using Newton's method, and find that second-order information is quite helpful even with privacy, although the benefit significantly diminishes with stricter privacy guarantees. While both methods use second-order information, least squares is effective at lower epsilons while Newton's method is effective at larger epsilon values. To combine the benefits of both, we propose a novel algorithm called DP-FC, which leverages feature covariance instead of the Hessian of the logistic regression loss and performs well across all $ε$ values we tried. With this, we obtain new SOTA results on ImageNet-1k, CIFAR-100 and CIFAR-10 across all values of $ε$ typically considered. Most remarkably, on ImageNet-1K, we obtain top-1 accuracy of 88\% under (8, $8 * 10^{-7}$)-DP and 84.3\% under (0.1, $8 * 10^{-7}$)-DP. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2211.11052 [pdf, other]

Convexifying Transformers: Improving optimization and understanding of transformer networks

Authors: Tolga Ergen, Behnam Neyshabur, Harsh Mehta

Abstract: Understanding the fundamental mechanism behind the success of transformer networks is still an open problem in the deep learning literature. Although their remarkable performance has been mostly attributed to the self-attention mechanism, the literature still lacks a solid analysis of these networks and interpretation of the functions learned by them. To this end, we study the training problem of… ▽ More Understanding the fundamental mechanism behind the success of transformer networks is still an open problem in the deep learning literature. Although their remarkable performance has been mostly attributed to the self-attention mechanism, the literature still lacks a solid analysis of these networks and interpretation of the functions learned by them. To this end, we study the training problem of attention/transformer networks and introduce a novel convex analytic approach to improve the understanding and optimization of these networks. Particularly, we first introduce a convex alternative to the self-attention mechanism and reformulate the regularized training problem of transformer networks with our alternative convex attention. Then, we cast the reformulation as a convex optimization problem that is interpretable and easier to optimize. Moreover, as a byproduct of our convex analysis, we reveal an implicit regularization mechanism, which promotes sparsity across tokens. Therefore, we not only improve the optimization of attention/transformer networks but also provide a solid theoretical understanding of the functions learned by them. We also demonstrate the effectiveness of our theory through several numerical experiments. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2211.06389 [pdf]

What does it mean to be "representative"?

Authors: Jacqueline E. Rudolph, Yongqi Zhong, Priya Duggal, Shruti H. Mehta, Bryan Lau

Abstract: Medical and population health science researchers frequently make ambiguous statements about whether they believe their study sample or results are "representative" of some (implicit or explicit) target population. Here, we provide a comprehensive definition of representativeness, with the goal of capturing the different ways in which a study can be representative of a target population. We propos… ▽ More Medical and population health science researchers frequently make ambiguous statements about whether they believe their study sample or results are "representative" of some (implicit or explicit) target population. Here, we provide a comprehensive definition of representativeness, with the goal of capturing the different ways in which a study can be representative of a target population. We propose that a study is representative if the estimate obtained in the study sample is generalizable to the target population (either due to representative sampling, estimation of stratum specific effects, or quantitative methods to generalize or transport estimates) or the interpretation of the results is generalizable to the target population (based on fundamental scientific premises and substantive background knowledge). We explore this definition in the context of four COVID-19 studies, ranging from laboratory science to descriptive epidemiology. All statements regarding representativeness should make clear the way in which the study results generalize, the target population the results are being generalized to, and the assumptions that must hold for that generalization to be scientifically or statistically justifiable. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: 15 pages, 0 figures

arXiv:2206.13947 [pdf, other]

Long Range Language Modeling via Gated State Spaces

Authors: Harsh Mehta, Ankit Gupta, Ashok Cutkosky, Behnam Neyshabur

Abstract: State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks. In this work we focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. Based on recent developments around the effectiveness of gated activation functions, we propose a new layer named Gated State Space (GSS) and sh… ▽ More State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks. In this work we focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. Based on recent developments around the effectiveness of gated activation functions, we propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4 (i.e. DSS) on TPUs, is fairly competitive with several well-tuned Transformer-based baselines and exhibits zero-shot generalization to longer inputs while being straightforward to implement. Finally, we show that leveraging self-attention to model local dependencies improves the performance of GSS even further. △ Less

Submitted 2 July, 2022; v1 submitted 26 June, 2022; originally announced June 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2205.02973 [pdf, other]

Large Scale Transfer Learning for Differentially Private Image Classification

Authors: Harsh Mehta, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

Abstract: Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Unfortunately, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private trai… ▽ More Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Unfortunately, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training. This is further exacerbated by the fact that increasing the number of parameters leads to larger degradation in utility with DP. In this work, we zoom in on the ImageNet dataset and demonstrate that, similar to the non-private case, pre-training over-parameterized models on a large public dataset can lead to substantial gains when the model is finetuned privately. Moreover, by systematically comparing private and non-private models across a range of large batch sizes, we find that similar to non-private setting, choice of optimizer can further improve performance substantially with DP. By using LAMB optimizer with DP-SGD we saw improvement of up to 20$\%$ points (absolute). Finally, we show that finetuning just the last layer for a \emph{single step} in the full batch setting, combined with extremely small-scale (near-zero) initialization leads to both SOTA results of 81.7 $\%$ under a wide privacy budget range of $ε\in [4, 10]$ and $δ$ = $10^{-6}$ while minimizing the computational overhead substantially. △ Less

Submitted 20 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

arXiv:2204.07827 [pdf, other]

Local treewidth of random and noisy graphs with applications to stop** contagion in networks

Authors: Hermish Mehta, Daniel Reichman

Abstract: We study the notion of local treewidth in sparse random graphs: the maximum treewidth over all $k$-vertex subgraphs of an $n$-vertex graph. When $k$ is not too large, we give nearly tight bounds for this local treewidth parameter; we also derive tight bounds for the local treewidth of noisy trees, trees where every non-edge is added independently with small probability. We apply our upper bounds o… ▽ More We study the notion of local treewidth in sparse random graphs: the maximum treewidth over all $k$-vertex subgraphs of an $n$-vertex graph. When $k$ is not too large, we give nearly tight bounds for this local treewidth parameter; we also derive tight bounds for the local treewidth of noisy trees, trees where every non-edge is added independently with small probability. We apply our upper bounds on the local treewidth to obtain fixed parameter tractable algorithms (on random graphs and noisy trees) for edge-removal problems centered around containing a contagious process evolving over a network. In these problems, our main parameter of study is $k$, the number of initially ``infected'' vertices in the network. For the random graph models we consider and a certain range of parameters the running time of our algorithms on $n$-vertex graphs is $2^{o(k)}\textrm{poly}(n)$, improving upon the $2^{Ω(k)}\textrm{poly}(n)$ performance of the best-known algorithms designed for worst-case instances of these edge deletion problems. △ Less

Submitted 15 July, 2022; v1 submitted 16 April, 2022; originally announced April 2022.

Comments: Accepted to RANDOM 2022

arXiv:2203.04159 [pdf, other]

doi 10.1016/j.iot.2022.100514

AI for Next Generation Computing: Emerging Trends and Future Directions

Authors: Sukhpal Singh Gill, Minxian Xu, Carlo Ottaviani, Panos Patros, Rami Bahsoon, Arash Shaghaghi, Muhammed Golec, Vlado Stankovski, Huaming Wu, Ajith Abraham, Manmeet Singh, Harshit Mehta, Soumya K. Ghosh, Thar Baker, Ajith Kumar Parlikad, Hanan Lutfiyya, Salil S. Kanhere, Rizos Sakellariou, Schahram Dustdar, Omer Rana, Ivona Brandic, Steve Uhlig

Abstract: Autonomic computing investigates how systems can achieve (user) specified control outcomes on their own, without the intervention of a human operator. Autonomic computing fundamentals have been substantially influenced by those of control theory for closed and open-loop systems. In practice, complex systems may exhibit a number of concurrent and inter-dependent control loops. Despite research into… ▽ More Autonomic computing investigates how systems can achieve (user) specified control outcomes on their own, without the intervention of a human operator. Autonomic computing fundamentals have been substantially influenced by those of control theory for closed and open-loop systems. In practice, complex systems may exhibit a number of concurrent and inter-dependent control loops. Despite research into autonomic models for managing computer resources, ranging from individual resources (e.g., web servers) to a resource ensemble (e.g., multiple resources within a data center), research into integrating Artificial Intelligence (AI) and Machine Learning (ML) to improve resource autonomy and performance at scale continues to be a fundamental challenge. The integration of AI/ML to achieve such autonomic and self-management of systems can be achieved at different levels of granularity, from full to human-in-the-loop automation. In this article, leading academics, researchers, practitioners, engineers, and scientists in the fields of cloud computing, AI/ML, and quantum computing join to discuss current research and potential future directions for these fields. Further, we discuss challenges and opportunities for leveraging AI and ML in next generation computing for emerging computing paradigms, including cloud, fog, edge, serverless and quantum computing environments. △ Less

Submitted 5 March, 2022; originally announced March 2022.

Comments: Accepted for Publication in Elsevier IoT Journal, 2022

arXiv:2202.06991 [pdf, other]

Transformer Memory as a Differentiable Search Index

Authors: Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

Abstract: In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids; in other words, a DSI model answers queries… ▽ More In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids; in other words, a DSI model answers queries directly using only its parameters, dramatically simplifying the whole retrieval process. We study variations in how documents and their identifiers are represented, variations in training procedures, and the interplay between models and corpus sizes. Experiments demonstrate that given appropriate design choices, DSI significantly outperforms strong baselines such as dual encoder models. Moreover, DSI demonstrates strong generalization capabilities, outperforming a BM25 baseline in a zero-shot setup. △ Less

Submitted 21 October, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: NeurIPS 2022

arXiv:2112.02194 [pdf, other]

ALX: Large Scale Matrix Factorization on TPUs

Authors: Harsh Mehta, Steffen Rendle, Walid Krichene, Li Zhang

Abstract: We present ALX, an open-source library for distributed matrix factorization using Alternating Least Squares, written in JAX. Our design allows for efficient use of the TPU architecture and scales well to matrix factorization problems of O(B) rows/columns by scaling the number of available TPU cores. In order to spur future research on large scale matrix factorization methods and to illustrate the… ▽ More We present ALX, an open-source library for distributed matrix factorization using Alternating Least Squares, written in JAX. Our design allows for efficient use of the TPU architecture and scales well to matrix factorization problems of O(B) rows/columns by scaling the number of available TPU cores. In order to spur future research on large scale matrix factorization methods and to illustrate the scalability properties of our own implementation, we also built a real world web link prediction dataset called WebGraph. This dataset can be easily modeled as a matrix factorization problem. We created several variants of this dataset based on locality and sparsity properties of sub-graphs. The largest variant of WebGraph has around 365M nodes and training a single epoch finishes in about 20 minutes with 256 TPU cores. We include speed and performance numbers of ALX on all variants of WebGraph. Both the framework code and the dataset is open-sourced. △ Less

Submitted 29 March, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

arXiv:2110.05603 [pdf, other]

Generalizing to New Domains by Map** Natural Language to Lifted LTL

Authors: Eric Hsiung, Hiloni Mehta, Junchi Chu, Xinyu Liu, Roma Patel, Stefanie Tellex, George Konidaris

Abstract: Recent work on using natural language to specify commands to robots has grounded that language to LTL. However, map** natural language task specifications to LTL task specifications using language models require probability distributions over finite vocabulary. Existing state-of-the-art methods have extended this finite vocabulary to include unseen terms from the input sequence to improve output… ▽ More Recent work on using natural language to specify commands to robots has grounded that language to LTL. However, map** natural language task specifications to LTL task specifications using language models require probability distributions over finite vocabulary. Existing state-of-the-art methods have extended this finite vocabulary to include unseen terms from the input sequence to improve output generalization. However, novel out-of-vocabulary atomic propositions cannot be generated using these methods. To overcome this, we introduce an intermediate contextual query representation which can be learned from single positive task specification examples, associating a contextual query with an LTL template. We demonstrate that this intermediate representation allows for generalization over unseen object references, assuming accurate groundings are available. We compare our method of map** natural language task specifications to intermediate contextual queries against state-of-the-art CopyNet models capable of translating natural language to LTL, by evaluating whether correct LTL for manipulation and navigation task specifications can be output, and show that our method outperforms the CopyNet model on unseen object references. We demonstrate that the grounded LTL our method outputs can be used for planning in a simulated OO-MDP environment. Finally, we discuss some common failure modes encountered when translating natural language task specifications to grounded LTL. △ Less

Submitted 9 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 7 pages (6 + 1 references page), 3 figures, 2 tables. Accepted to ICRA 2022. To appear in Proceedings of the 2022 International Conference on Robotics and Automation, May 2022

arXiv:2106.14343 [pdf, other]

High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

Authors: Ashok Cutkosky, Harsh Mehta

Abstract: We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails. We show that a combination of gradient clip**, momentum, and normalized gradient descent yields convergence to critical points in high-probability with best-known rates for smooth losses when the gradients only have bounded $\mathfrak{p}$th moments for some… ▽ More We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails. We show that a combination of gradient clip**, momentum, and normalized gradient descent yields convergence to critical points in high-probability with best-known rates for smooth losses when the gradients only have bounded $\mathfrak{p}$th moments for some $\mathfrak{p}\in(1,2]$. We then consider the case of second-order smooth losses, which to our knowledge have not been studied in this setting, and again obtain high-probability bounds for any $\mathfrak{p}$. Moreover, our results hold for arbitrary smooth norms, in contrast to the typical SGD analysis which requires a Hilbert space norm. Further, we show that after a suitable "burn-in" period, the objective value will monotonically decrease for every iteration until a critical point is identified, which provides intuition behind the popular practice of learning rate "warm-up" and also yields a last-iterate guarantee. △ Less

Submitted 9 November, 2021; v1 submitted 27 June, 2021; originally announced June 2021.

arXiv:2008.13363 [pdf, other]

Extreme Memorization via Scale of Initialization

Authors: Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur

Abstract: We construct an experimental setup in which changing the scale of initialization strongly impacts the implicit regularization induced by SGD, interpolating from good generalization performance to completely memorizing the training set while making little progress on the test set. Moreover, we find that the extent and manner in which generalization ability is affected depends on the activation and… ▽ More We construct an experimental setup in which changing the scale of initialization strongly impacts the implicit regularization induced by SGD, interpolating from good generalization performance to completely memorizing the training set while making little progress on the test set. Moreover, we find that the extent and manner in which generalization ability is affected depends on the activation and loss function used, with $\sin$ activation demonstrating extreme memorization. In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function. Our empirical investigation reveals that increasing the scale of initialization correlates with misalignment of representations and gradients across examples in the same class. This insight allows us to devise an alignment measure over gradients and representations which can capture this phenomenon. We demonstrate that our alignment measure correlates with generalization of deep models trained on image classification tasks. △ Less

Submitted 1 May, 2021; v1 submitted 31 August, 2020; originally announced August 2020.

arXiv:2006.00342 [pdf, other]

WattsApp: Power-Aware Container Scheduling

Authors: Hemant Mehta, Paul Harvey, Omer Rana, Rajkumar Buyya, Blesson Varghese

Abstract: Containers are becoming a popular workload deployment mechanism in modern distributed systems. However, there are limited software-based methods (hardware-based methods are expensive requiring hardware level changes) for obtaining the power consumed by containers for facilitating power-aware container scheduling, an essential activity for efficient management of distributed systems. This paper pre… ▽ More Containers are becoming a popular workload deployment mechanism in modern distributed systems. However, there are limited software-based methods (hardware-based methods are expensive requiring hardware level changes) for obtaining the power consumed by containers for facilitating power-aware container scheduling, an essential activity for efficient management of distributed systems. This paper presents WattsApp, a tool underpinned by a six step software-based method for power-aware container scheduling to minimize power cap violations on a server. The proposed method relies on a neural network-based power estimation model and a power capped container scheduling technique. Experimental studies are pursued in a lab-based environment on 10 benchmarks deployed on Intel and ARM processors. The results highlight that the power estimation model has negligible overheads for data collection - nearly 90% of all data samples can be estimated with less than a 10% error, and the Mean Absolute Percentage Error (MAPE) is less than 6%. The power-aware scheduling of WattsApp is more effective than Intel's Running Power Average Limit (RAPL) based power cap** for both single and multiple containers as it does not degrade the performance of all containers running on the server. The results confirm the feasibility of WattsApp. △ Less

Submitted 30 May, 2020; originally announced June 2020.

arXiv:2002.03305 [pdf, other]

Momentum Improves Normalized SGD

Authors: Ashok Cutkosky, Harsh Mehta

Abstract: We provide an improved analysis of normalized SGD showing that adding momentum provably removes the need for large batch sizes on non-convex objectives. Then, we consider the case of objectives with bounded second derivative and show that in this case a small tweak to the momentum formula allows normalized SGD with momentum to find an $ε$-critical point in $O(1/ε^{3.5})$ iterations, matching the b… ▽ More We provide an improved analysis of normalized SGD showing that adding momentum provably removes the need for large batch sizes on non-convex objectives. Then, we consider the case of objectives with bounded second derivative and show that in this case a small tweak to the momentum formula allows normalized SGD with momentum to find an $ε$-critical point in $O(1/ε^{3.5})$ iterations, matching the best-known rates without accruing any logarithmic factors or dependence on dimension. We also provide an adaptive method that automatically improves convergence rates when the variance in the gradients is small. Finally, we show that our method is effective when employed on popular large scale tasks such as ResNet-50 and BERT pretraining, matching the performance of the disparate methods used to get state-of-the-art results on both tasks. △ Less

Submitted 16 May, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

arXiv:2001.03671 [pdf, other]

Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View

Authors: Harsh Mehta, Yoav Artzi, Jason Baldridge, Eugene Ie, Piotr Mirowski

Abstract: The Touchdown dataset (Chen et al., 2019) provides instructions by human annotators for navigation through New York City streets and for resolving spatial descriptions at a given location. To enable the wider research community to work effectively with the Touchdown tasks, we are publicly releasing the 29k raw Street View panoramas needed for Touchdown. We follow the process used for the StreetLea… ▽ More The Touchdown dataset (Chen et al., 2019) provides instructions by human annotators for navigation through New York City streets and for resolving spatial descriptions at a given location. To enable the wider research community to work effectively with the Touchdown tasks, we are publicly releasing the 29k raw Street View panoramas needed for Touchdown. We follow the process used for the StreetLearn data release (Mirowski et al., 2019) to check panoramas for personally identifiable information and blur them as necessary. These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn. We also provide a reference implementation for both of the Touchdown tasks: vision and language navigation (VLN) and spatial description resolution (SDR). We compare our model results to those given in Chen et al. (2019) and show that the panoramas we have added to StreetLearn fully support both Touchdown tasks and can be used effectively for further research and comparison. △ Less

Submitted 10 January, 2020; originally announced January 2020.

arXiv:1912.03241 [pdf, other]

VALAN: Vision and Language Agent Navigation

Authors: Larry Lansing, Vihan Jain, Harsh Mehta, Haoshuo Huang, Eugene Ie

Abstract: VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture. The framework facilitates the development and evaluation of embodied agents for solving grounded language understanding tasks, such as Vision-and-Language Navigation and Vision-and-Dialog Navigation, in photo-realistic environments, such as Matterport3D and Google StreetView. W… ▽ More VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture. The framework facilitates the development and evaluation of embodied agents for solving grounded language understanding tasks, such as Vision-and-Language Navigation and Vision-and-Dialog Navigation, in photo-realistic environments, such as Matterport3D and Google StreetView. We have added a minimal set of abstractions on top of SEED RL allowing us to generalize the architecture to solve a variety of other RL problems. In this article, we will describe VALAN's software abstraction and architecture, and also present an example of using VALAN to design agents for instruction-conditioned indoor navigation. △ Less

Submitted 6 December, 2019; originally announced December 2019.

arXiv:1911.01941 [pdf]

doi 10.1016/j.iot.2019.100118

Transformative effects of IoT, Blockchain and Artificial Intelligence on cloud computing: Evolution, vision, trends and open challenges

Authors: Sukhpal Singh Gill, Shreshth Tuli, Minxian Xu, Inderpreet Singh, Karan Vijay Singh, Dominic Lindsay, Shikhar Tuli, Daria Smirnova, Manmeet Singh, Udit Jain, Haris Pervaiz, Bhanu Sehgal, Sukhwinder Singh Kaila, Sanjay Misra, Mohammad Sadegh Aslanpour, Harshit Mehta, Vlado Stankovski, Peter Garraghan

Abstract: Cloud computing plays a critical role in modern society and enables a range of applications from infrastructure to social media. Such system must cope with varying load and evolving usage reflecting societies interaction and dependency on automated computing systems whilst satisfying Quality of Service (QoS) guarantees. Enabling these systems are a cohort of conceptual technologies, synthesized to… ▽ More Cloud computing plays a critical role in modern society and enables a range of applications from infrastructure to social media. Such system must cope with varying load and evolving usage reflecting societies interaction and dependency on automated computing systems whilst satisfying Quality of Service (QoS) guarantees. Enabling these systems are a cohort of conceptual technologies, synthesized to meet demand of evolving computing applications. In order to understand current and future challenges of such system, there is a need to identify key technologies enabling future applications. In this study, we aim to explore how three emerging paradigms (Blockchain, IoT and Artificial Intelligence) will influence future cloud computing systems. Further, we identify several technologies driving these paradigms and invite international experts to discuss the current status and future directions of cloud computing. Finally, we proposed a conceptual model for cloud futurology to explore the influence of emerging paradigms and technologies on evolution of cloud computing. △ Less

Submitted 21 October, 2019; originally announced November 2019.

Comments: 30 Pages, 4 Figures and Preprint version - Published in Elsevier's Internet of Things Journal

arXiv:1911.00121 [pdf, ps, other]

Counting extensions of number fields with Frobenius Galois group

Authors: Harsh Mehta

Abstract: Let $G$ be a Frobenius group with an abelian Frobenius kernel $F$ and let $k$ be a finite extension of $\mathbb{Q}$. We obtain an upper bound for the number of degree $|F|$ algebraic extensions $K/k$ with Galois group $G$ with the norm of the discriminant $\mathcal{N}_{k/\mathbb{Q}}(d_{K/k})$ bounded above by $X$. We extend this method for any group $G$ that has an abelian normal subgroup. If $G$… ▽ More Let $G$ be a Frobenius group with an abelian Frobenius kernel $F$ and let $k$ be a finite extension of $\mathbb{Q}$. We obtain an upper bound for the number of degree $|F|$ algebraic extensions $K/k$ with Galois group $G$ with the norm of the discriminant $\mathcal{N}_{k/\mathbb{Q}}(d_{K/k})$ bounded above by $X$. We extend this method for any group $G$ that has an abelian normal subgroup. If $G$ has an abelian normal subgroup, then we obtain upper bounds for the number of degree $|G|$ extensions $N/k$ with Galois group $G$ with bounded norm of the discriminant. Malle made a conjecture about what the order of magnitude of this quantity should be as the degree of the extension $d$ and underlying Galois group $G$ vary. We show that under the $\ell$-torsion conjecture, the upper bounds we achieve for certain pairs $d$ and $G$ agree with the prediction of Malle. Unconditionally we show that the upper bound for the number of degree 6 extensions with Galois group $A_4$ also satisfies Malle's weak conjecture. △ Less

Submitted 31 October, 2019; originally announced November 2019.

Comments: This is a preliminary version

arXiv:1908.03409 [pdf, other]

Transferable Representation Learning in Vision-and-Language Navigation

Authors: Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie

Abstract: Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals. The overall task requires competence in several perception problems: successful agents combine spatio-temporal, vision and language understanding to produce appropriate action sequenc… ▽ More Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals. The overall task requires competence in several perception problems: successful agents combine spatio-temporal, vision and language understanding to produce appropriate action sequences. Our approach adapts pre-trained vision and language representations to relevant in-domain tasks making them more effective for VLN. Specifically, the representations are adapted to solve both a cross-modal sequence alignment and sequence coherence task. In the sequence alignment task, the model determines whether an instruction corresponds to a sequence of visual frames. In the sequence coherence task, the model determines whether the perceptual sequences are predictive sequentially in the instruction-conditioned latent space. By transferring the domain-adapted representations, we improve competitive agents in R2R as measured by the success rate weighted by path length (SPL) metric. △ Less

Submitted 12 August, 2019; v1 submitted 9 August, 2019; originally announced August 2019.

Comments: To appear in ICCV 2019

arXiv:1905.13358 [pdf, other]

Multi-modal Discriminative Model for Vision-and-Language Navigation

Authors: Haoshuo Huang, Vihan Jain, Harsh Mehta, Jason Baldridge, Eugene Ie

Abstract: Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals. Successful agents must have the ability to parse natural language of varying linguistic styles, ground them in potentially unfamiliar scenes, plan and react with ambigu… ▽ More Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals. Successful agents must have the ability to parse natural language of varying linguistic styles, ground them in potentially unfamiliar scenes, plan and react with ambiguous environmental feedback. Generalization ability is limited by the amount of human annotated data. In particular, \emph{paired} vision-language sequence data is expensive to collect. We develop a discriminator that evaluates how well an instruction explains a given path in VLN task using multi-modal alignment. Our study reveals that only a small fraction of the high-quality augmented data from \citet{Fried:2018:Speaker}, as scored by our discriminator, is useful for training VLN agents with similar performance on previously unseen environments. We also show that a VLN agent warm-started with pre-trained components from the discriminator outperforms the benchmark success rates of 35.5 by 10\% relative measure on previously unseen environments. △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: Accepted at SpLU-RoboNLP 2019 (workshop at NAACL)

arXiv:1808.03633 [pdf, ps, other]

A New Algorithm for the Robust Semi-random Independent Set Problem

Authors: Theo McKenzie, Hermish Mehta, Luca Trevisan

Abstract: In this paper, we study a general semi-random version of the planted independent set problem in a model initially proposed by Feige and Kilian, which has a large proportion of adversarial edges. We give a new deterministic algorithm that finds a list of independent sets, one of which, with high probability, is the planted one, provided that the planted set has size $k=Ω(n^{2/3})$. This improves… ▽ More In this paper, we study a general semi-random version of the planted independent set problem in a model initially proposed by Feige and Kilian, which has a large proportion of adversarial edges. We give a new deterministic algorithm that finds a list of independent sets, one of which, with high probability, is the planted one, provided that the planted set has size $k=Ω(n^{2/3})$. This improves on Feige and Kilian's original randomized algorithm, which with high probability recovers an independent set of size at least $k$ when $k=αn$ where $α$ is a constant. △ Less

Submitted 30 October, 2019; v1 submitted 10 August, 2018; originally announced August 2018.

arXiv:1712.06957 [pdf, other]

MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs

Authors: Pranav Rajpurkar, Jeremy Irvin, Aarti Bagul, Daisy Ding, Tony Duan, Hershel Mehta, Brandon Yang, Kaylie Zhu, Dillon Laird, Robyn L. Ball, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Y. Ng

Abstract: We introduce MURA, a large dataset of musculoskeletal radiographs containing 40,561 images from 14,863 studies, where each study is manually labeled by radiologists as either normal or abnormal. To evaluate models robustly and to get an estimate of radiologist performance, we collect additional labels from six board-certified Stanford radiologists on the test set, consisting of 207 musculoskeletal… ▽ More We introduce MURA, a large dataset of musculoskeletal radiographs containing 40,561 images from 14,863 studies, where each study is manually labeled by radiologists as either normal or abnormal. To evaluate models robustly and to get an estimate of radiologist performance, we collect additional labels from six board-certified Stanford radiologists on the test set, consisting of 207 musculoskeletal studies. On this test set, the majority vote of a group of three radiologists serves as gold standard. We train a 169-layer DenseNet baseline model to detect and localize abnormalities. Our model achieves an AUROC of 0.929, with an operating point of 0.815 sensitivity and 0.887 specificity. We compare our model and radiologists on the Cohen's kappa statistic, which expresses the agreement of our model and of each radiologist with the gold standard. Model performance is comparable to the best radiologist performance in detecting abnormalities on finger and wrist studies. However, model performance is lower than best radiologist performance in detecting abnormalities on elbow, forearm, hand, humerus, and shoulder studies. We believe that the task is a good challenge for future research. To encourage advances, we have made our dataset freely available at https://stanfordmlgroup.github.io/competitions/mura . △ Less

Submitted 22 May, 2018; v1 submitted 11 December, 2017; originally announced December 2017.

Comments: 1st Conference on Medical Imaging with Deep Learning (MIDL 2018)

arXiv:1711.05225 [pdf, other]

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

Authors: Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Y. Ng

Abstract: We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists. Our algorithm, CheXNet, is a 121-layer convolutional neural network trained on ChestX-ray14, currently the largest publicly available chest X-ray dataset, containing over 100,000 frontal-view X-ray images with 14 diseases. Four practicing academic radiologists annotate a test set, on w… ▽ More We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists. Our algorithm, CheXNet, is a 121-layer convolutional neural network trained on ChestX-ray14, currently the largest publicly available chest X-ray dataset, containing over 100,000 frontal-view X-ray images with 14 diseases. Four practicing academic radiologists annotate a test set, on which we compare the performance of CheXNet to that of radiologists. We find that CheXNet exceeds average radiologist performance on the F1 metric. We extend CheXNet to detect all 14 diseases in ChestX-ray14 and achieve state of the art results on all 14 diseases. △ Less

Submitted 25 December, 2017; v1 submitted 14 November, 2017; originally announced November 2017.

arXiv:1503.00199 [pdf, other]

doi 10.1080/10586458.2015.1020578

Products of Farey Fractions

Authors: Jeffery Lagarias, Harsh Mehta

Abstract: The {Farey fractions} $F_n$ of order $n$ consist of all fractions $\frac{h}{k}$ in lowest terms lying in the closed unit interval and having denominator at most $n$. This paper considers the products $F_n$ of all nonzero Farey fractions of order $n$. It studies their growth measured by $\log(F_n)$ and their divisibility properties by powers of a fixed prime, given by $ord_p(F_n)$, as a function of… ▽ More The {Farey fractions} $F_n$ of order $n$ consist of all fractions $\frac{h}{k}$ in lowest terms lying in the closed unit interval and having denominator at most $n$. This paper considers the products $F_n$ of all nonzero Farey fractions of order $n$. It studies their growth measured by $\log(F_n)$ and their divisibility properties by powers of a fixed prime, given by $ord_p(F_n)$, as a function of $n$. The growth of $\log(F_n)$ is related to the Riemann hypothesis. This paper theoretically and empirically studies the functions $ord_p(F_n)$ and formulates several unproved properties (P1)-(P4) they may have. It presents evidence raising the possibility that the Riemann hypothesis may also be encoded in $ord_p(F_n)$ for a single prime $p$. This encoding makes use of a relation of these products to the products $G_n$ of all reduced and unreduced Farey fractions of order $n$, which are connected by Möbius inversion. It involves new arithmetic functions which mix the Möbius function with functions of radix expansions to a fixed prime base $p$. △ Less

Submitted 9 May, 2017; v1 submitted 28 February, 2015; originally announced March 2015.

Comments: 32 pages, 10 figures

Journal ref: Experimental Mathematics 26, No. 1, 1--21 (2017)

arXiv:1409.4145 [pdf, other]

doi 10.1142/S1793042116500044

Products of binomial coefficients and unreduced Farey fractions

Authors: Jeffrey C. Lagarias, Harsh Mehta

Abstract: This paper studies the product $\bar{G}_n$ of the binomial coefficients in the n-th row of Pascal's triangle, which equals the reciprocal of the product of all the reduced and unreduced Farey fractions of order n. It studies its size as a real number, measured by its logarithm $log(\bar{G}_n)$, and its prime factorization, measured by the order of divisibility by a fixed prime p, each viewed as a… ▽ More This paper studies the product $\bar{G}_n$ of the binomial coefficients in the n-th row of Pascal's triangle, which equals the reciprocal of the product of all the reduced and unreduced Farey fractions of order n. It studies its size as a real number, measured by its logarithm $log(\bar{G}_n)$, and its prime factorization, measured by the order of divisibility by a fixed prime p, each viewed as a function of n. It derives three formulas for its prime power divisibility, $ord_p(\bar{G}_n)$, two of which relate it to base p radix expansions of n, and which display different facets of its behavior. These formulas are used to determine the maximal growth rate of each $ord_p(\bar{G}_n)$ and structure of the fluctuations of these functions. It also defines analogous functions for all integer bases $b$ replacing prime bases. A final topic relates the factorizations of $\bar{G}_n$ to Chebyshev-type prime-counting estimates and the prime number theorem. △ Less

Submitted 15 September, 2015; v1 submitted 14 September, 2014; originally announced September 2014.

Comments: 30 pages, 3 figures, two Appendices. ; v2 is 31 pages, Appendices moved before reference list; v3 is 31 pages,corrections to match journal version

MSC Class: Primary 11B65; Secondary: 05A10; 11B57; 11N05; 11N64

Journal ref: International J. of Number Theory 12 (2016), no.1, 57--91

arXiv:1311.1407 [pdf, other]

The L1 norm of the generalized de la Vallee Poussin kernel

Authors: Harsh Mehta

Abstract: Charles de la Vall'ee Poussin defined two different kernels that bear his name. This paper considers the one are a linear combinations of two Fej'er kernels, which are known as the delayed means. We show that the $L^1$ norms are constant in families of delayed means, and determine the exact value Charles de la Vall'ee Poussin defined two different kernels that bear his name. This paper considers the one are a linear combinations of two Fej'er kernels, which are known as the delayed means. We show that the $L^1$ norms are constant in families of delayed means, and determine the exact value △ Less

Submitted 5 November, 2013; originally announced November 2013.

Comments: 12 pages, 4 figures

MSC Class: 42A16 ACM Class: F.2.1

arXiv:1201.4210 [pdf]

Collaborative Personalized Web Recommender System using Entropy based Similarity Measure

Authors: Harita Mehta, Shveta Kundra Bhatia, Punam Bedi, V. S. Dixit

Abstract: On the internet, web surfers, in the search of information, always strive for recommendations. The solutions for generating recommendations become more difficult because of exponential increase in information domain day by day. In this paper, we have calculated entropy based similarity between users to achieve solution for scalability problem. Using this concept, we have implemented an online user… ▽ More On the internet, web surfers, in the search of information, always strive for recommendations. The solutions for generating recommendations become more difficult because of exponential increase in information domain day by day. In this paper, we have calculated entropy based similarity between users to achieve solution for scalability problem. Using this concept, we have implemented an online user based collaborative web recommender system. In this model based collaborative system, the user session is divided into two levels. Entropy is calculated at both the levels. It is shown that from the set of valuable recommenders obtained at level I; only those recommenders having lower entropy at level II than entropy at level I, served as trustworthy recommenders. Finally, top N recommendations are generated from such trustworthy recommenders for an online user. △ Less

Submitted 20 January, 2012; originally announced January 2012.

Comments: 10 pages

Journal ref: IJCSI, Vol 8, Issue 6, No 3, Nov 2011

Showing 1–41 of 41 results for author: Mehta, H