-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT
Authors:
Seyed Mohammad Hossein Hashemi,
Leila Safari,
Amirhossein Dadashzade Taromi
Abstract:
In the field of medical sciences, reliable detection and classification of brain tumors from images remains a formidable challenge due to the rarity of tumors within the population of patients. Therefore, the ability to detect tumors in anomaly scenarios is paramount for ensuring timely interventions and improved patient outcomes. This study addresses the issue by leveraging deep learning (DL) tec…
▽ More
In the field of medical sciences, reliable detection and classification of brain tumors from images remains a formidable challenge due to the rarity of tumors within the population of patients. Therefore, the ability to detect tumors in anomaly scenarios is paramount for ensuring timely interventions and improved patient outcomes. This study addresses the issue by leveraging deep learning (DL) techniques to detect and classify brain tumors in challenging situations. The curated data set from the National Brain Map** Lab (NBML) comprises 81 patients, including 30 Tumor cases and 51 Normal cases. The detection and classification pipelines are separated into two consecutive tasks. The detection phase involved comprehensive data analysis and pre-processing to modify the number of image samples and the number of patients of each class to anomaly distribution (9 Normal per 1 Tumor) to comply with real world scenarios. Next, in addition to common evaluation metrics for the testing, we employed a novel performance evaluation method called Patient to Patient (PTP), focusing on the realistic evaluation of the model. In the detection phase, we fine-tuned a YOLOv8n detection model to detect the tumor region. Subsequent testing and evaluation yielded competitive performance both in Common Evaluation Metrics and PTP metrics. Furthermore, using the Data Efficient Image Transformer (DeiT) module, we distilled a Vision Transformer (ViT) model from a fine-tuned ResNet152 as a teacher in the classification phase. This approach demonstrates promising strides in reliable tumor detection and classification, offering potential advancements in tumor diagnosis for real-world medical imaging scenarios.
△ Less
Submitted 10 January, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Differentially Private Heavy Hitter Detection using Federated Analytics
Authors:
Karan Chadha,
Junye Chen,
John Duchi,
Vitaly Feldman,
Hanieh Hashemi,
Omid Javidbakht,
Audra McMillan,
Kunal Talwar
Abstract:
In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning…
▽ More
In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning algorithm that improves the performance of the algorithm while satisfying computational, communication and privacy constraints. We explore the impact of different data-selection schemes as well as the impact of introducing deny lists during multiple runs of the algorithm. We test these improvements using extensive experimentation on the Reddit dataset~\cite{caldas2018leaf} on the task of learning the most frequent words.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
A Tractable Statistical Representation of IFTR Fading with Applications
Authors:
Maryam Olyaee,
Hadi Hashemi,
Juan M. Romero-Jerez
Abstract:
The recently introduced independent fluctuating two-ray (IFTR) fading model, consisting of two specular components fluctuating independently plus a diffuse component, has proven to provide an excellent fit to different wireless environments, including the millimeter-wave band. However, the original formulations of the probability density function (PDF) and cumulative distribution function (CDF) of…
▽ More
The recently introduced independent fluctuating two-ray (IFTR) fading model, consisting of two specular components fluctuating independently plus a diffuse component, has proven to provide an excellent fit to different wireless environments, including the millimeter-wave band. However, the original formulations of the probability density function (PDF) and cumulative distribution function (CDF) of this model are not applicable to all possible values of its defining parameters, and are given in terms of multifold generalized hypergeometric functions, which prevents their widespread use for the derivation of performance metric expressions. In this paper we present a new formulation of the IFTR model as a countable mixture of Gamma distributions which greatly facilitates the performance evaluation for this model in terms of the metrics already known for the much simpler and widely used Nakagami-m fading. Additionally, a closed-form expression is presented for the generalized moment generating function (GMGF), which permits to readily obtain all the moments of the distribution of the model, as well as several relevant performance metrics. Based on these new derivations, the IFTR model is evaluated for the average channel capacity, the outage probability with and without co-channel interference, and the bit error rate (BER), which are verified by Monte Carlo simulations.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Dense Retrieval Adaptation using Target Domain Description
Authors:
Helia Hashemi,
Yong Zhuang,
Sachith Sri Ram Kothur,
Srivas Prasad,
Edgar Meij,
W. Bruce Croft
Abstract:
In information retrieval (IR), domain adaptation is the process of adapting a retrieval model to a new domain whose data distribution is different from the source domain. Existing methods in this area focus on unsupervised domain adaptation where they have access to the target document collection or supervised (often few-shot) domain adaptation where they additionally have access to (limited) labe…
▽ More
In information retrieval (IR), domain adaptation is the process of adapting a retrieval model to a new domain whose data distribution is different from the source domain. Existing methods in this area focus on unsupervised domain adaptation where they have access to the target document collection or supervised (often few-shot) domain adaptation where they additionally have access to (limited) labeled data in the target domain. There also exists research on improving zero-shot performance of retrieval models with no adaptation. This paper introduces a new category of domain adaptation in IR that is as-yet unexplored. Here, similar to the zero-shot setting, we assume the retrieval model does not have access to the target document collection. In contrast, it does have access to a brief textual description that explains the target domain. We define a taxonomy of domain attributes in retrieval tasks to understand different properties of a source domain that can be adapted to a target domain. We introduce a novel automatic data construction pipeline that produces a synthetic document collection, query set, and pseudo relevance labels, given a textual domain description. Extensive experiments on five diverse target domains show that adapting dense retrieval models using the constructed synthetic data leads to effective retrieval performance on the target domain.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
PaLM 2 Technical Report
Authors:
Rohan Anil,
Andrew M. Dai,
Orhan Firat,
Melvin Johnson,
Dmitry Lepikhin,
Alexandre Passos,
Siamak Shakeri,
Emanuel Taropa,
Paige Bailey,
Zhifeng Chen,
Eric Chu,
Jonathan H. Clark,
Laurent El Shafey,
Yan** Huang,
Kathy Meier-Hellstern,
Gaurav Mishra,
Erica Moreira,
Mark Omernick,
Kevin Robinson,
Sebastian Ruder,
Yi Tay,
Kefan Xiao,
Yuanzhong Xu,
Yu**g Zhang,
Gustavo Hernandez Abrego
, et al. (103 additional authors not shown)
Abstract:
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on…
▽ More
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.
When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.
△ Less
Submitted 13 September, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Task Space Control of Robot Manipulators based on Visual SLAM
Authors:
Seyed Hamed Hashemi,
Jouni Mattila
Abstract:
This paper aims to address the open problem of designing a globally stable vision-based controller for robot manipulators. Accordingly, based on a hybrid mechanism, this paper proposes a novel task-space control law attained by taking the gradient of a potential function in SE(3). The key idea is to employ the Visual Simultaneous Localization and Map** (VSLAM) algorithm to estimate a robot pose.…
▽ More
This paper aims to address the open problem of designing a globally stable vision-based controller for robot manipulators. Accordingly, based on a hybrid mechanism, this paper proposes a novel task-space control law attained by taking the gradient of a potential function in SE(3). The key idea is to employ the Visual Simultaneous Localization and Map** (VSLAM) algorithm to estimate a robot pose. The estimated robot pose is then used in the proposed hybrid controller as feedback information. Invoking Barbalats lemma and Lyapunov's stability theorem, it is guaranteed that the resulting closed-loop system is globally asymptotically stable, which is the main accomplishment of the proposed structure. Simulation studies are conducted on a six degrees of freedom (6-DOF) robot manipulator to demonstrate the effectiveness and validate the performance of the proposed VSLAM-based control scheme.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Analysis of the Outage Probability of Ground-Based Relaying for Satellite Systems
Authors:
Hadi Hashemi,
Beatriz Soret,
M. Carmen Aguayo-Torres
Abstract:
This paper investigates the theoretical basis for using ground relaying in multi-antenna satellites exposed to blocking situations. Inactive and unobstructed User Equipments (UEs) located on ground are the relaying nodes of UEs that are not in the field of view of the satellite. Exact closed-form relationships of the Signal-to-Noise Ratio (SNR) and the outage probability are obtained for the case…
▽ More
This paper investigates the theoretical basis for using ground relaying in multi-antenna satellites exposed to blocking situations. Inactive and unobstructed User Equipments (UEs) located on ground are the relaying nodes of UEs that are not in the field of view of the satellite. Exact closed-form relationships of the Signal-to-Noise Ratio (SNR) and the outage probability are obtained for the case where each user is connected to two transmitting antennas at the satellite. The channel between the satellite and ground is modeled as a shadowed-Rician (SR), whereas a Fluctuating Two-Ray (FTR) fading model is used for the mmWaves ground link between relay and UE, as well as cases with perfect and imperfect Channel State Information (CSI). The simulation results showed that if perfect CSI is not available in the pre-coding and only the signal phase is estimated, the performance loss is minimal and the system can reach its ideal performance by spending limited power. In any case, the closed-form for the ideal state are a good proxy to predict the performance under non-ideal conditions.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems
Authors:
Hanieh Hashemi,
Wenjie Xiong,
Liu Ke,
Kiwan Maeng,
Murali Annavaram,
G. Edward Suh,
Hsien-Hsin S. Lee
Abstract:
Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model…
▽ More
Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model size, there was not enough attention paid to the potential information leakage through sparse features. These sparse features are employed to track users' behavior, e.g., their click history, object interactions, etc., potentially carrying each user's private information. Sparse features are represented as learned embedding vectors that are stored in large tables, and personalized recommendation is performed by using a specific user's sparse feature to index through the tables. Even with recently-proposed methods that hides the computation happening in the cloud, an attacker in the cloud may be able to still track the access patterns to the embedding tables. This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns. We first characterize the types of attacks that can be carried out on sparse features in recommendation models in an untrusted cloud, followed by a demonstration of how each of these attacks leads to extracting users' private information or tracking users by their behavior over time.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted Hardware
Authors:
Hanieh Hashemi,
Yongqin Wang,
Murali Annavaram
Abstract:
Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train or infer with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. Tackling such a challenge requires unifying theoret…
▽ More
Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train or infer with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. Tackling such a challenge requires unifying theoretical privacy algorithms with hardware security capabilities. This paper presents DarKnight, a framework for large DNN training while protecting input privacy and computation integrity. DarKnight relies on cooperative execution between trusted execution environments (TEE) and accelerators, where the TEE provides privacy and integrity verification, while accelerators perform the bulk of the linear algebraic computation to optimize the performance. In particular, DarKnight uses a customized data encoding strategy based on matrix masking to create input obfuscation within a TEE. The obfuscated data is then offloaded to GPUs for fast linear algebraic computation. DarKnight's data obfuscation strategy provides provable data privacy and computation integrity in the cloud servers. While prior works tackle inference privacy and cannot be utilized for training, DarKnight's encoding scheme is designed to support both training and inference.
△ Less
Submitted 30 June, 2022;
originally announced July 2022.
-
Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings
Authors:
Tiantian Feng,
Hanieh Hashemi,
Rajat Hebbar,
Murali Annavaram,
Shrikanth S. Narayanan
Abstract:
Speech emotion recognition (SER) processes speech signals to detect and characterize expressed perceived emotions. Many SER application systems often acquire and transmit speech data collected at the client-side to remote cloud platforms for inference and decision making. However, speech data carry rich information not only about emotions conveyed in vocal expressions, but also other sensitive dem…
▽ More
Speech emotion recognition (SER) processes speech signals to detect and characterize expressed perceived emotions. Many SER application systems often acquire and transmit speech data collected at the client-side to remote cloud platforms for inference and decision making. However, speech data carry rich information not only about emotions conveyed in vocal expressions, but also other sensitive demographic traits such as gender, age and language background. Consequently, it is desirable for SER systems to have the ability to classify emotion constructs while preventing unintended/improper inferences of sensitive and demographic information. Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing their local data. This training approach appears secure and can improve privacy for SER. However, recent works have demonstrated that FL approaches are still vulnerable to various privacy attacks like reconstruction attacks and membership inference attacks. Although most of these have focused on computer vision applications, such information leakages exist in the SER systems trained using the FL technique. To assess the information leakage of SER systems trained using FL, we propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters, corresponding to the FedSGD and the FedAvg training algorithms, respectively. As a use case, we empirically evaluate our approach for predicting the client's gender information using three SER benchmark datasets: IEMOCAP, CREMA-D, and MSP-Improv. We show that the attribute inference attack is achievable for SER systems trained using FL. We further identify that most information leakage possibly comes from the first layer in the SER model.
△ Less
Submitted 22 December, 2022; v1 submitted 26 December, 2021;
originally announced December 2021.
-
Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning
Authors:
Tingting Tang,
Ramy E. Ali,
Hanieh Hashemi,
Tynan Gangwani,
Salman Avestimehr,
Murali Annavaram
Abstract:
Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Some prior works proposed coded computing strategies to jointly address all three challenges. They require either a large number of workers, a significant communication cost or a significant computational complexity to tolerate Byzantine workers. Much of the overhead in prior schemes comes from…
▽ More
Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Some prior works proposed coded computing strategies to jointly address all three challenges. They require either a large number of workers, a significant communication cost or a significant computational complexity to tolerate Byzantine workers. Much of the overhead in prior schemes comes from the fact that they tightly couple coding for all three problems into a single framework. In this paper, we propose Adaptive Verifiable Coded Computing (AVCC) framework that decouples the Byzantine node detection challenge from the straggler tolerance. AVCC leverages coded computing just for handling stragglers and privacy, and then uses an orthogonal approach that leverages verifiable computing to mitigate Byzantine workers. Furthermore, AVCC dynamically adapts its coding scheme to trade-off straggler tolerance with Byzantine protection. We evaluate AVCC on a compute-intensive distributed logistic regression application. Our experiments show that AVCC achieves up to $4.2\times$ speedup and up to $5.1\%$ accuracy improvement over the state-of-the-art Lagrange coded computing approach (LCC). AVCC also speeds up the conventional uncoded implementation of distributed logistic regression by up to $7.6\times$, and improves the test accuracy by up to $12.1\%$.
△ Less
Submitted 22 March, 2022; v1 submitted 27 July, 2021;
originally announced July 2021.
-
Automatic Construction of Enterprise Knowledge Base
Authors:
Junyi Chai,
Yujie He,
Homa Hashemi,
Bing Li,
Daraksha Parveen,
Ranganath Kondapally,
Wen** Xu
Abstract:
In this paper, we present an automatic knowledge base construction system from large scale enterprise documents with minimal efforts of human intervention. In the design and deployment of such a knowledge mining system for enterprise, we faced several challenges including data distributional shift, performance evaluation, compliance requirements and other practical issues. We leveraged state-of-th…
▽ More
In this paper, we present an automatic knowledge base construction system from large scale enterprise documents with minimal efforts of human intervention. In the design and deployment of such a knowledge mining system for enterprise, we faced several challenges including data distributional shift, performance evaluation, compliance requirements and other practical issues. We leveraged state-of-the-art deep learning models to extract information (named entities and definitions) at per document level, then further applied classical machine learning techniques to process global statistical information to improve the knowledge base. Experimental results are reported on actual enterprise documents. This system is currently serving as part of a Microsoft 365 service.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Current Challenges and Future Directions in Podcast Information Access
Authors:
Rosie Jones,
Hamed Zamani,
Markus Schedl,
Ching-Wei Chen,
Sravana Reddy,
Ann Clifton,
Jussi Karlgren,
Helia Hashemi,
Aasish Pappu,
Zahra Nazari,
Longqi Yang,
Oguz Semerci,
Hugues Bouchard,
Ben Carterette
Abstract:
Podcasts are spoken documents across a wide-range of genres and styles, with growing listenership across the world, and a rapidly lowering barrier to entry for both listeners and creators. The great strides in search and recommendation in research and industry have yet to see impact in the podcast space, where recommendations are still largely driven by word of mouth. In this perspective paper, we…
▽ More
Podcasts are spoken documents across a wide-range of genres and styles, with growing listenership across the world, and a rapidly lowering barrier to entry for both listeners and creators. The great strides in search and recommendation in research and industry have yet to see impact in the podcast space, where recommendations are still largely driven by word of mouth. In this perspective paper, we highlight the many differences between podcasts and other media, and discuss our perspective on challenges and future research directions in the domain of podcast information access.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Byzantine-Robust and Privacy-Preserving Framework for FedML
Authors:
Hanieh Hashemi,
Yongqin Wang,
Chuan Guo,
Murali Annavaram
Abstract:
Federated learning has emerged as a popular paradigm for collaboratively training a model from data distributed among a set of clients. This learning setting presents, among others, two unique challenges: how to protect privacy of the clients' data during training, and how to ensure integrity of the trained model. We propose a two-pronged solution that aims to address both challenges under a singl…
▽ More
Federated learning has emerged as a popular paradigm for collaboratively training a model from data distributed among a set of clients. This learning setting presents, among others, two unique challenges: how to protect privacy of the clients' data during training, and how to ensure integrity of the trained model. We propose a two-pronged solution that aims to address both challenges under a single framework. First, we propose to create secure enclaves using a trusted execution environment (TEE) within the server. Each client can then encrypt their gradients and send them to verifiable enclaves. The gradients are decrypted within the enclave without the fear of privacy breaches. However, robustness check computations in a TEE are computationally prohibitive. Hence, in the second step, we perform a novel gradient encoding that enables TEEs to encode the gradients and then offloading Byzantine check computations to accelerators such as GPUs. Our proposed approach provides theoretical bounds on information leakage and offers a significant speed-up over the baseline in empirical evaluation.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
Privacy and Integrity Preserving Training Using Trusted Hardware
Authors:
Hanieh Hashemi,
Yongqin Wang,
Murali Annavaram
Abstract:
Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. However, Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. This work presents DarKnight, a framework for large…
▽ More
Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. However, Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. This work presents DarKnight, a framework for large DNN training while protecting input privacy and computation integrity. DarKnight relies on cooperative execution between trusted execution environments (TEE) and accelerators, where the TEE provides privacy and integrity verification, while accelerators perform the computation heavy linear algebraic operations.
△ Less
Submitted 1 May, 2021;
originally announced May 2021.
-
GenoML: Automated Machine Learning for Genomics
Authors:
Mary B. Makarious,
Hampton L. Leonard,
Dan Vitale,
Hirotaka Iwaki,
David Saffo,
Lana Sargent,
Anant Dadu,
Eduardo Salmerón Castaño,
John F. Carter,
Melina Maleknia,
Juan A. Botia,
Cornelis Blauwendraat,
Roy H. Campbell,
Sayed Hadi Hashemi,
Andrew B. Singleton,
Mike A. Nalls,
Faraz Faghri
Abstract:
GenoML is a Python package automating machine learning workflows for genomics (genetics and multi-omics) with an open science philosophy. Genomics data require significant domain expertise to clean, pre-process, harmonize and perform quality control of the data. Furthermore, tuning, validation, and interpretation involve taking into account the biology and possibly the limitations of the underlyin…
▽ More
GenoML is a Python package automating machine learning workflows for genomics (genetics and multi-omics) with an open science philosophy. Genomics data require significant domain expertise to clean, pre-process, harmonize and perform quality control of the data. Furthermore, tuning, validation, and interpretation involve taking into account the biology and possibly the limitations of the underlying data collection, protocols, and technology. GenoML's mission is to bring machine learning for genomics and clinical data to non-experts by develo** an easy-to-use tool that automates the full development, evaluation, and deployment process. Emphasis is put on open science to make workflows easily accessible, replicable, and transferable within the scientific community. Source code and documentation is available at https://genoml.com.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Secure and Fault Tolerant Decentralized Learning
Authors:
Saurav Prakash,
Hanieh Hashemi,
Yongqin Wang,
Murali Annavaram,
Salman Avestimehr
Abstract:
Federated learning (FL) is a promising paradigm for training a global model over data distributed across multiple data owners without centralizing clients' raw data. However, sharing of local model updates can also reveal information of clients' local datasets. Trusted execution environments (TEEs) within the FL server have been recently deployed by companies like Meta for secure aggregation. Howe…
▽ More
Federated learning (FL) is a promising paradigm for training a global model over data distributed across multiple data owners without centralizing clients' raw data. However, sharing of local model updates can also reveal information of clients' local datasets. Trusted execution environments (TEEs) within the FL server have been recently deployed by companies like Meta for secure aggregation. However, secure aggregation can suffer from error-prone local updates sent by clients that become faulty during training due to underlying device malfunctions. Also, data heterogeneity across clients makes fault mitigation challenging, as even updates from normal clients are dissimilar. Thus, most of the prior fault tolerant methods, which treat any local update differing from the majority of other updates as faulty, perform poorly. We propose DiverseFL to make model aggregation secure as well as robust to faults. In DiverseFL, any client whose local model update diverges from its associated guiding update is tagged as being faulty. To implement our novel per-client criteria for fault mitigation, DiverseFL creates a TEE-based secure enclave within the FL server, which in addition to performing secure aggregation for carrying out the global model update step, securely receives a small representative sample of local data from each client only once before training, and computes guiding updates for each participating client during training. Thus, DiverseFL provides security against privacy leakage as well as robustness against faulty clients. In experiments, DiverseFL consistently achieves significant improvements in absolute test accuracy over prior fault mitigation benchmarks. DiverseFL also performs closely to OracleSGD, where server combines updates only from the normal clients. We also analyze the convergence rate of DiverseFL under non-IID data and standard convexity assumptions.
△ Less
Submitted 13 September, 2022; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search
Authors:
Helia Hashemi,
Hamed Zamani,
W. Bruce Croft
Abstract:
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying questions have been studied recently but the accurate utilization of user responses to clarifying questions has been relatively les…
▽ More
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying questions have been studied recently but the accurate utilization of user responses to clarifying questions has been relatively less explored. In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources that weights each term in the conversation. We evaluate this Guided Transformer model in a conversational search scenario that includes clarifying questions. In our experiments, we use two separate external sources, including the top retrieved documents and a set of different possible clarifying questions for the query. We implement the proposed representation learning model for two downstream tasks in conversational search; document retrieval and next clarifying question selection. Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
DarKnight: A Data Privacy Scheme for Training and Inference of Deep Neural Networks
Authors:
Hanieh Hashemi,
Yongqin Wang,
Murali Annavaram
Abstract:
Protecting the privacy of input data is of growing importance as machine learning methods reach new application domains. In this paper, we provide a unified training and inference framework for large DNNs while protecting input privacy and computation integrity. Our approach called DarKnight uses a novel data blinding strategy using matrix masking to create input obfuscation within a trusted execu…
▽ More
Protecting the privacy of input data is of growing importance as machine learning methods reach new application domains. In this paper, we provide a unified training and inference framework for large DNNs while protecting input privacy and computation integrity. Our approach called DarKnight uses a novel data blinding strategy using matrix masking to create input obfuscation within a trusted execution environment (TEE). Our rigorous mathematical proof demonstrates that our blinding process provides information-theoretic privacy guarantee by bounding information leakage. The obfuscated data can then be offloaded to any GPU for accelerating linear operations on blinded data. The results from linear operations on blinded data are decoded before performing non-linear operations within the TEE. This cooperative execution allows DarKnight to exploit the computational power of GPUs to perform linear operations while exploiting TEEs to protect input privacy. We implement DarKnight on an Intel SGX TEE augmented with a GPU to evaluate its performance.
△ Less
Submitted 15 October, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Caramel: Accelerating Decentralized Distributed Deep Learning with Computation Scheduling
Authors:
Sayed Hadi Hashemi,
Sangeetha Abdu Jyothi,
Brighten Godfrey,
Roy Campbell
Abstract:
The method of choice for parameter aggregation in Deep Neural Network (DNN) training, a network-intensive task, is shifting from the Parameter Server model to decentralized aggregation schemes (AllReduce) inspired by theoretical guarantees of better performance. However, current implementations of AllReduce overlook the interdependence of communication and computation, resulting in significant per…
▽ More
The method of choice for parameter aggregation in Deep Neural Network (DNN) training, a network-intensive task, is shifting from the Parameter Server model to decentralized aggregation schemes (AllReduce) inspired by theoretical guarantees of better performance. However, current implementations of AllReduce overlook the interdependence of communication and computation, resulting in significant performance degradation. In this paper, we develop Caramel, a system that accelerates decentralized distributed deep learning through model-aware computation scheduling and communication optimizations for AllReduce. Caramel achieves this goal through (a) computation DAG scheduling that expands the feasible window of transfer for each parameter (transfer boundaries), and (b) network optimizations for smoothening of the load including adaptive batching and pipelining of parameter transfers. Caramel maintains the correctness of the dataflow model, is hardware-independent, and does not require any user-level or framework-level changes. We implement Caramel over TensorFlow and show that the iteration time of DNN training can be improved by up to 3.62x in a cloud environment.
△ Less
Submitted 29 April, 2020;
originally announced April 2020.
-
ANTIQUE: A Non-Factoid Question Answering Benchmark
Authors:
Helia Hashemi,
Mohammad Aliannejadi,
Hamed Zamani,
W. Bruce Croft
Abstract:
Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop a…
▽ More
Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop and release a collection of 2,626 open-domain non-factoid questions from a diverse set of categories. The dataset, called ANTIQUE, contains 34,011 manual relevance annotations. The questions were asked by real users in a community question answering service, i.e., Yahoo! Answers. Relevance judgments for all the answers to each question were collected through crowdsourcing. To facilitate further research, we also include a brief analysis of the data as well as baseline results on both classical and recently developed neural IR models.
△ Less
Submitted 19 August, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
PartitionedVC: Partitioned External Memory Graph Analytics Framework for SSDs
Authors:
Kiran Kumar Matam,
Hanieh Hashemi,
Murali Annavaram
Abstract:
Graph analytics are at the heart of a broad range of applications such as drug discovery, page ranking, and recommendation systems. When graph size exceeds memory size, out-of-core graph processing is needed. For the widely used external memory graph processing systems, accessing storage becomes the bottleneck. We make the observation that nearly all graph algorithms have a dynamically varying num…
▽ More
Graph analytics are at the heart of a broad range of applications such as drug discovery, page ranking, and recommendation systems. When graph size exceeds memory size, out-of-core graph processing is needed. For the widely used external memory graph processing systems, accessing storage becomes the bottleneck. We make the observation that nearly all graph algorithms have a dynamically varying number of active vertices that must be processed in each iteration. However, existing graph processing frameworks, such as GraphChi, load the entire graph in each iteration even if a small fraction of the graph is active. This limitation is due to the structure of the data storage used by these systems. In this work, we propose to use a compressed sparse row (CSR) based graph storage that is more amenable for selectively loading only a few active vertices in each iteration. But CSR based systems suffers from random update propagation to many target vertices. To solve this challenge, we propose to use a multi-log update mechanism that logs updates separately, rather than directly update the active edges in a graph. Our proposed multi-log system maintains a separate log per each vertex interval. This separation enables us to efficiently process each vertex interval by just loading the corresponding log. Further, while accessing SSD pages with fewer active vertex data, we reduce the read amplification due to the page granular accesses in SSD by logging the active vertex data in the current iteration and efficiently reading the log in the next iteration. Over the current state of the art out-of-core graph processing framework, our PartitionedVC improves performance by up to $17.84\times$, $1.19\times$, $1.65\times$, $1.38\times$, $3.15\times$, and $6.00\times$ for the widely used bfs, pagerank, community detection, graph coloring, maximal independent set, and random-walk applications, respectively.
△ Less
Submitted 11 February, 2020; v1 submitted 10 May, 2019;
originally announced May 2019.
-
Deep-learning PDEs with unlabeled data and hardwiring physics laws
Authors:
S. Mohammad H. Hashemi,
Demetri Psaltis
Abstract:
Providing fast and accurate solutions to partial differential equations is a problem of continuous interest to the fields of applied mathematics and physics. With the recent advances in machine learning, the adoption learning techniques in this domain is being eagerly pursued. We build upon earlier works on linear and homogeneous PDEs, and develop convolutional deep neural networks that can accura…
▽ More
Providing fast and accurate solutions to partial differential equations is a problem of continuous interest to the fields of applied mathematics and physics. With the recent advances in machine learning, the adoption learning techniques in this domain is being eagerly pursued. We build upon earlier works on linear and homogeneous PDEs, and develop convolutional deep neural networks that can accurately solve nonlinear and non-homogeneous equations without the need for labeled data. The architecture of these networks is readily accessible for scientific disciplines who deal with PDEs and know the basics of deep learning.
△ Less
Submitted 13 April, 2019;
originally announced April 2019.
-
Output-Oblivious Stochastic Chemical Reaction Networks
Authors:
Ben Chugg,
Anne Condon,
Hooman Hashemi
Abstract:
We classify the functions $f:\mathbb{N}^2 \rightarrow \mathbb{N}$ which are stably computable by output-oblivious Stochastic Chemical Reaction Networks (CRNs), i.e., systems of reactions in which output species are never reactants. While it is known that precisely the semilinear functions are stably computable by CRNs, such CRNs sometimes rely on initially producing too many output species and the…
▽ More
We classify the functions $f:\mathbb{N}^2 \rightarrow \mathbb{N}$ which are stably computable by output-oblivious Stochastic Chemical Reaction Networks (CRNs), i.e., systems of reactions in which output species are never reactants. While it is known that precisely the semilinear functions are stably computable by CRNs, such CRNs sometimes rely on initially producing too many output species and then consuming the excess in order to reach a correct stable state. These CRNs may be difficult to integrate into larger systems: if the output of a CRN $\mathcal{C}$ becomes the input to a downstream CRN $\mathcal{C}'$, then $\mathcal{C}'$ could inadvertently consume too many outputs before $\mathcal{C}$ stabilizes. If, on the other hand, $\mathcal{C}$ is output-oblivious then $\mathcal{C}'$ may consume $\mathcal{C}$'s output as soon as it is available. In this work we prove that a semilinear function $f:\mathbb{N}^2 \rightarrow \mathbb{N}$ is stably computable by an output-oblivious CRN with a leader if and only if it is both increasing and either grid-affine (intuitively, its domains are congruence classes), or the minimum of a finite set of fissure functions (intuitively, functions behaving like the min function).
△ Less
Submitted 30 August, 2022; v1 submitted 7 December, 2018;
originally announced December 2018.
-
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling
Authors:
Sayed Hadi Hashemi,
Sangeetha Abdu Jyothi,
Roy H. Campbell
Abstract:
State-of-the-art deep learning systems rely on iterative distributed training to tackle the increasing complexity of models and input data. The iteration time in these communication-heavy systems depends on the computation time, communication time and the extent of overlap of computation and communication.
In this work, we identify a shortcoming in systems with graph representation for computati…
▽ More
State-of-the-art deep learning systems rely on iterative distributed training to tackle the increasing complexity of models and input data. The iteration time in these communication-heavy systems depends on the computation time, communication time and the extent of overlap of computation and communication.
In this work, we identify a shortcoming in systems with graph representation for computation, such as TensorFlow and PyTorch, that result in high variance in iteration time --- random order of received parameters across workers. We develop a system, TicTac, to improve the iteration time by fixing this issue in distributed deep learning with Parameter Servers while guaranteeing near-optimal overlap of communication and computation. TicTac identifies and enforces an order of network transfers which improves the iteration time using prioritization. Our system is implemented over TensorFlow and requires no changes to the model or developer inputs. TicTac improves the throughput by up to $37.7\%$ in inference and $19.2\%$ in training, while also reducing straggler effect by up to $2.3\times$. Our code is publicly available.
△ Less
Submitted 3 October, 2018; v1 submitted 8 March, 2018;
originally announced March 2018.
-
Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case
Authors:
Faraz Faghri,
Sayed Hadi Hashemi,
Mohammad Babaeizadeh,
Mike A. Nalls,
Saurabh Sinha,
Roy H. Campbell
Abstract:
In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine learning algorithms cover classification, clustering, regression, graphical model-based learning, and d…
▽ More
In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine learning algorithms cover classification, clustering, regression, graphical model-based learning, and dimensionality reduction. The goal of this study is to guide the focus of scalable computing experts in the endeavor of applying new storage and scalable computation designs to bioinformatics algorithms that merit their attention most, following the engineering maxim of "optimize the common case".
△ Less
Submitted 29 September, 2017;
originally announced October 2017.
-
Decentralized User-Centric Access Control using PubSub over Blockchain
Authors:
Sayed Hadi Hashemi,
Faraz Faghri,
Roy H Campbell
Abstract:
We present a mechanism that puts users in the center of control and empowers them to dictate the access to their collections of data. Revisiting the fundamental mechanisms in security for providing protection, our solution uses capabilities, access lists, and access rights following well-understood formal notions for reasoning about access. This contribution presents a practical, correct, auditabl…
▽ More
We present a mechanism that puts users in the center of control and empowers them to dictate the access to their collections of data. Revisiting the fundamental mechanisms in security for providing protection, our solution uses capabilities, access lists, and access rights following well-understood formal notions for reasoning about access. This contribution presents a practical, correct, auditable, transparent, distributed, and decentralized mechanism that is well-matched to the current emerging environments including Internet of Things, smart city, precision medicine, and autonomous cars. It is based on well-tested principles and practices used in a distributed authorization, cryptocurrencies, and scalable computing.
△ Less
Submitted 29 September, 2017;
originally announced October 2017.
-
Performance Modeling of Distributed Deep Neural Networks
Authors:
Sayed Hadi Hashemi,
Shadi A. Noghabi,
William Gropp,
Roy H Campbell
Abstract:
During the past decade, machine learning has become extremely popular and can be found in many aspects of our every day life. Nowayadays with explosion of data while rapid growth of computation capacity, Distributed Deep Neural Networks (DDNNs) which can improve their performance linearly with more computation resources, have become hot and trending. However, there has not been an in depth study o…
▽ More
During the past decade, machine learning has become extremely popular and can be found in many aspects of our every day life. Nowayadays with explosion of data while rapid growth of computation capacity, Distributed Deep Neural Networks (DDNNs) which can improve their performance linearly with more computation resources, have become hot and trending. However, there has not been an in depth study of the performance of these systems, and how well they scale.
In this paper we analyze CNTK, one of the most commonly used DDNNs, by first building a performance model and then evaluating the system two settings: a small cluster with all nodes in a single rack connected to a top of rack switch, and in large scale using Blue Waters with arbitary placement of nodes. Our main focus was the scalability of the system with respect to adding more nodes. Based on our results, this system has an excessive initialization overhead because of poor I/O utilization which dominates the whole execution time. Because of this, the system does not scale beyond a few nodes (4 in Blue Waters). Additionally, due to a single server-multiple worker design the server becomes a bottleneck after 16 nodes limiting the scalability of the CNTK.
△ Less
Submitted 14 December, 2016; v1 submitted 1 December, 2016;
originally announced December 2016.
-
Hoffmann-Ostenhof's conjecture for traceable cubic graphs
Authors:
F. Abdolhosseini,
S. Akbari,
H. Hashemi,
M. S. Moradian
Abstract:
It was conjectured by Hoffmann-Ostenhof that the edge set of every connected cubic graph can be decomposed into a spanning tree, a matching and a family of cycles. In this paper, we show that this conjecture holds for traceable cubic graphs.
It was conjectured by Hoffmann-Ostenhof that the edge set of every connected cubic graph can be decomposed into a spanning tree, a matching and a family of cycles. In this paper, we show that this conjecture holds for traceable cubic graphs.
△ Less
Submitted 16 July, 2016;
originally announced July 2016.
-
Experimental Demonstration of Nanosecond Accuracy Wireless Network Synchronization
Authors:
Marcelo Segura,
S. Niranjayan,
Hossein Hashemi,
Andreas F. Molisch
Abstract:
Accurate wireless timing synchronization has been an extremely important topic in wireless sensor networks, required in applications ranging from distributed beam forming to precision localization and navigation. However, it is very challenging to realize, in particular when the required accuracy should be better than the runtime between the nodes. This work presents, to our knowledge for the firs…
▽ More
Accurate wireless timing synchronization has been an extremely important topic in wireless sensor networks, required in applications ranging from distributed beam forming to precision localization and navigation. However, it is very challenging to realize, in particular when the required accuracy should be better than the runtime between the nodes. This work presents, to our knowledge for the first time, an experimental timing synchronization scheme that achieves a timing accuracy better than 5ns rms in a network with 4 nodes. The experimental hardware is built from commercially available components and based on software defined ultra wideband transceivers. The protocol for establishing the synchronization is based on our recently developed blink protocol that can scale from the small network demonstrated here to larger networks of hundreds or thousands of nodes.
△ Less
Submitted 26 September, 2014;
originally announced September 2014.
-
An improved genetic algorithm with a local optimization strategy and an extra mutation level for solving traveling salesman problem
Authors:
Keivan Borna,
Vahid Haji Hashemi
Abstract:
The Traveling salesman problem (TSP) is proved to be NP-complete in most cases. The genetic algorithm (GA) is one of the most useful algorithms for solving this problem. In this paper a conventional GA is compared with an improved hybrid GA in solving TSP. The improved or hybrid GA consist of conventional GA and two local optimization strategies. The first strategy is extracting all sequential gro…
▽ More
The Traveling salesman problem (TSP) is proved to be NP-complete in most cases. The genetic algorithm (GA) is one of the most useful algorithms for solving this problem. In this paper a conventional GA is compared with an improved hybrid GA in solving TSP. The improved or hybrid GA consist of conventional GA and two local optimization strategies. The first strategy is extracting all sequential groups including four cities of samples and changing the two central cities with each other. The second local optimization strategy is similar to an extra mutation process. In this step with a low probability a sample is selected. In this sample two random cities are defined and the path between these cities is reversed. The computation results show that the proposed method also finds better paths than the conventional GA within an acceptable computation time.
△ Less
Submitted 10 September, 2014;
originally announced September 2014.