Search | arXiv e-print repository

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2306.05203 [pdf, other]

A cognitive process approach to modeling gap acceptance in overtaking

Authors: Samir H. A. Mohammad, Haneen Farah, Arkady Zgonnikov

Abstract: Driving automation holds significant potential for enhancing traffic safety. However, effectively handling interactions with human drivers in mixed traffic remains a challenging task. Several models exist that attempt to capture human behavior in traffic interactions, often focusing on gap acceptance. However, it is not clear how models of an individual driver's gap acceptance can be translated to… ▽ More Driving automation holds significant potential for enhancing traffic safety. However, effectively handling interactions with human drivers in mixed traffic remains a challenging task. Several models exist that attempt to capture human behavior in traffic interactions, often focusing on gap acceptance. However, it is not clear how models of an individual driver's gap acceptance can be translated to dynamic human-AV interactions in the context of high-speed scenarios like overtaking. In this study, we address this issue by employing a cognitive process approach to describe the dynamic interactions by the oncoming vehicle during overtaking maneuvers. Our findings reveal that by incorporating an initial decision-making bias dependent on the initial velocity into existing drift-diffusion models, we can accurately describe the qualitative patterns of overtaking gap acceptance observed previously. Our results demonstrate the potential of the cognitive process approach in modeling human overtaking behavior when the oncoming vehicle is an AV. To this end, this study contributes to the development of effective strategies for ensuring safe and efficient overtaking interactions between human drivers and AVs. △ Less

Submitted 8 June, 2023; originally announced June 2023.

arXiv:2304.14269 [pdf]

Performance Analysis of DNA Crossbar Arrays for High-Density Memory Storage Applications

Authors: Arpan De, Hashem Mohammad, Yiren Wang, Rajkumar Kubendran, Arindam K. Das, M. P. Anantram

Abstract: Deoxyribonucleic acid (DNA) has emerged as a promising building block for next-generation ultra-high density storage devices. Although DNA has high durability and extremely high density in nature, its potential as the basis of storage devices is currently hindered by limitations such as expensive and complex fabrication processes and time-consuming read-write operations. In this article, we propos… ▽ More Deoxyribonucleic acid (DNA) has emerged as a promising building block for next-generation ultra-high density storage devices. Although DNA has high durability and extremely high density in nature, its potential as the basis of storage devices is currently hindered by limitations such as expensive and complex fabrication processes and time-consuming read-write operations. In this article, we propose the use of a DNA crossbar array architecture for an electrically readable Read-Only Memory (DNA-ROM). While information can be written error-free to a DNA-ROM array using appropriate sequence encoding, its read accuracy can be affected by several factors such as array size, interconnect resistance, and Fermi energy deviations from HOMO levels of DNA strands employed in the crossbar. We study the impact of array size and interconnect resistance on the bit error rate of a DNA-ROM array through extensive Monte Carlo simulations. We have also analyzed the performance of our proposed DNA crossbar array for an image storage application, as a function of array size and interconnect resistance. While we expect that future advances in bioengineering and materials science will address some of the fabrication challenges associated with DNA crossbar arrays, we believe that the comprehensive body of results we present in this paper establishes the technical viability of DNA crossbar arrays as low-power, high-density storage devices. Finally, our analysis of array performance vis-a-vis interconnect resistance should provide valuable insights into aspects of the fabrication process such as the proper choice of interconnects necessary for ensuring high read accuracies. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2209.06727 [pdf]

Automated Fidelity Assessment for Strategy Training in Inpatient Rehabilitation using Natural Language Processing

Authors: Hunter Osterhoudt, Courtney E. Schneider, Haneef A Mohammad, Minmei Shih, Alexandra E. Harper, Leming Zhou, Elizabeth R Skidmore, Yanshan Wang

Abstract: Strategy training is a multidisciplinary rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke. Strategy training has been shown in randomized, controlled clinical trials to be a more feasible and efficacious intervention for promoting independence than traditional rehabilitation approaches. A standardized fidelity assessment is… ▽ More Strategy training is a multidisciplinary rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke. Strategy training has been shown in randomized, controlled clinical trials to be a more feasible and efficacious intervention for promoting independence than traditional rehabilitation approaches. A standardized fidelity assessment is used to measure adherence to treatment principles by examining guided and directed verbal cues in video recordings of rehabilitation sessions. Although the fidelity assessment for detecting guided and directed verbal cues is valid and feasible for single-site studies, it can become labor intensive, time consuming, and expensive in large, multi-site pragmatic trials. To address this challenge to widespread strategy training implementation, we leveraged natural language processing (NLP) techniques to automate the strategy training fidelity assessment, i.e., to automatically identify guided and directed verbal cues from video recordings of rehabilitation sessions. We developed a rule-based NLP algorithm, a long-short term memory (LSTM) model, and a bidirectional encoder representation from transformers (BERT) model for this task. The best performance was achieved by the BERT model with a 0.8075 F1-score. This BERT model was verified on an external validation dataset collected from a separate major regional health system and achieved an F1 score of 0.8259, which shows that the BERT model generalizes well. The findings from this study hold widespread promise in psychology and rehabilitation intervention research and practice. △ Less

Submitted 24 January, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: Accepted at the AMIA Informatics Summit 2023

arXiv:2204.09601 [pdf]

Extraction of Sleep Information from Clinical Notes of Patients with Alzheimer's Disease Using Natural Language Processing

Authors: Sonish Sivarajkumar, Thomas Yu CHow Tam, Haneef Ahamed Mohammad, Samual Viggiano, David Oniani, Shyam Visweswaran, Yanshan Wang

Abstract: Alzheimer's Disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information… ▽ More Alzheimer's Disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients' subjective experience. A gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192,000 de-identified clinical notes of 7,266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based Natural Language Processing (NLP) algorithm, machine learning models, and Large Language Model(LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, nap**, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset. Rule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of Positive Predictive Value (PPV), rule-based NLP algorithm achieved 1.00 for daytime sleepiness and sleep duration, machine learning models: 0.95 and for nap**, 0.86 for bad sleep quality and 0.90 for snoring; and LLAMA2 with finetuning achieved PPV of 0.93 for Night Wakings, 0.89 for sleep problem, and 1.00 for sleep duration. The results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD, but could be extended to general sleep information extraction for other diseases. △ Less

Submitted 15 March, 2024; v1 submitted 8 March, 2022; originally announced April 2022.

arXiv:2201.03556 [pdf, other]

Reproducing BowNet: Learning Representations by Predicting Bags of Visual Words

Authors: Harry Nguyen, Stone Yun, Hisham Mohammad

Abstract: This work aims to reproduce results from the CVPR 2020 paper by Gidaris et al. Self-supervised learning (SSL) is used to learn feature representations of an image using an unlabeled dataset. This work proposes to use bag-of-words (BoW) deep feature descriptors as a self-supervised learning target to learn robust, deep representations. BowNet is trained to reconstruct the histogram of visual words… ▽ More This work aims to reproduce results from the CVPR 2020 paper by Gidaris et al. Self-supervised learning (SSL) is used to learn feature representations of an image using an unlabeled dataset. This work proposes to use bag-of-words (BoW) deep feature descriptors as a self-supervised learning target to learn robust, deep representations. BowNet is trained to reconstruct the histogram of visual words (ie. the deep BoW descriptor) of a reference image when presented a perturbed version of the image as input. Thus, this method aims to learn perturbation-invariant and context-aware image features that can be useful for few-shot tasks or supervised downstream tasks. In the paper, the author describes BowNet as a network consisting of a convolutional feature extractor $Φ(\cdot)$ and a Dense-softmax layer $Ω(\cdot)$ trained to predict BoW features from images. After BoW training, the features of $Φ$ are used in downstream tasks. For this challenge we were trying to build and train a network that could reproduce the CIFAR-100 accuracy improvements reported in the original paper. However, we were unsuccessful in reproducing an accuracy improvement comparable to what the authors mentioned. This could be for a variety of factors and we believe that time constraints were the primary bottleneck. △ Less

Submitted 14 January, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

Comments: This is a reproducibility project. Original work is by Gidaris et al. published in CVPR 2020. Pytorch implementation is public on Github. v2 clarifies comments regarding communication with original authors

arXiv:1103.3382 [pdf]

A New Localized Network Based Routing Model in Computer and Communication networks

Authors: Abdulbaset H. Mohammad

Abstract: In view of the fact that routing algorithms are network layer entities and the varying performance of any routing algorithm depends on the underlying networks. Localized routing algorithms avoid the problems associated with the maintenance of global network state by using statistics of flow blocking probabilities. We developed a new network parameter that can be used to predict which network topol… ▽ More In view of the fact that routing algorithms are network layer entities and the varying performance of any routing algorithm depends on the underlying networks. Localized routing algorithms avoid the problems associated with the maintenance of global network state by using statistics of flow blocking probabilities. We developed a new network parameter that can be used to predict which network topology gives better performance on the quality of localized QoS routing algorithms. Using this parameter we explore a simple model that can be rewired to introduce increasing the performance. We find that this model have small characteristic path length. Simulations of random and complex networks used to show that the performance is significantly affected by the level of connectivity. △ Less

Submitted 17 March, 2011; originally announced March 2011.

Comments: 18 pages, 9 figures

Showing 1–7 of 7 results for author: Mohammad, H