-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
A cognitive process approach to modeling gap acceptance in overtaking
Authors:
Samir H. A. Mohammad,
Haneen Farah,
Arkady Zgonnikov
Abstract:
Driving automation holds significant potential for enhancing traffic safety. However, effectively handling interactions with human drivers in mixed traffic remains a challenging task. Several models exist that attempt to capture human behavior in traffic interactions, often focusing on gap acceptance. However, it is not clear how models of an individual driver's gap acceptance can be translated to…
▽ More
Driving automation holds significant potential for enhancing traffic safety. However, effectively handling interactions with human drivers in mixed traffic remains a challenging task. Several models exist that attempt to capture human behavior in traffic interactions, often focusing on gap acceptance. However, it is not clear how models of an individual driver's gap acceptance can be translated to dynamic human-AV interactions in the context of high-speed scenarios like overtaking. In this study, we address this issue by employing a cognitive process approach to describe the dynamic interactions by the oncoming vehicle during overtaking maneuvers. Our findings reveal that by incorporating an initial decision-making bias dependent on the initial velocity into existing drift-diffusion models, we can accurately describe the qualitative patterns of overtaking gap acceptance observed previously. Our results demonstrate the potential of the cognitive process approach in modeling human overtaking behavior when the oncoming vehicle is an AV. To this end, this study contributes to the development of effective strategies for ensuring safe and efficient overtaking interactions between human drivers and AVs.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Performance Analysis of DNA Crossbar Arrays for High-Density Memory Storage Applications
Authors:
Arpan De,
Hashem Mohammad,
Yiren Wang,
Rajkumar Kubendran,
Arindam K. Das,
M. P. Anantram
Abstract:
Deoxyribonucleic acid (DNA) has emerged as a promising building block for next-generation ultra-high density storage devices. Although DNA has high durability and extremely high density in nature, its potential as the basis of storage devices is currently hindered by limitations such as expensive and complex fabrication processes and time-consuming read-write operations. In this article, we propos…
▽ More
Deoxyribonucleic acid (DNA) has emerged as a promising building block for next-generation ultra-high density storage devices. Although DNA has high durability and extremely high density in nature, its potential as the basis of storage devices is currently hindered by limitations such as expensive and complex fabrication processes and time-consuming read-write operations. In this article, we propose the use of a DNA crossbar array architecture for an electrically readable Read-Only Memory (DNA-ROM). While information can be written error-free to a DNA-ROM array using appropriate sequence encoding, its read accuracy can be affected by several factors such as array size, interconnect resistance, and Fermi energy deviations from HOMO levels of DNA strands employed in the crossbar. We study the impact of array size and interconnect resistance on the bit error rate of a DNA-ROM array through extensive Monte Carlo simulations. We have also analyzed the performance of our proposed DNA crossbar array for an image storage application, as a function of array size and interconnect resistance. While we expect that future advances in bioengineering and materials science will address some of the fabrication challenges associated with DNA crossbar arrays, we believe that the comprehensive body of results we present in this paper establishes the technical viability of DNA crossbar arrays as low-power, high-density storage devices. Finally, our analysis of array performance vis-a-vis interconnect resistance should provide valuable insights into aspects of the fabrication process such as the proper choice of interconnects necessary for ensuring high read accuracies.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Automated Fidelity Assessment for Strategy Training in Inpatient Rehabilitation using Natural Language Processing
Authors:
Hunter Osterhoudt,
Courtney E. Schneider,
Haneef A Mohammad,
Minmei Shih,
Alexandra E. Harper,
Leming Zhou,
Elizabeth R Skidmore,
Yanshan Wang
Abstract:
Strategy training is a multidisciplinary rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke. Strategy training has been shown in randomized, controlled clinical trials to be a more feasible and efficacious intervention for promoting independence than traditional rehabilitation approaches. A standardized fidelity assessment is…
▽ More
Strategy training is a multidisciplinary rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke. Strategy training has been shown in randomized, controlled clinical trials to be a more feasible and efficacious intervention for promoting independence than traditional rehabilitation approaches. A standardized fidelity assessment is used to measure adherence to treatment principles by examining guided and directed verbal cues in video recordings of rehabilitation sessions. Although the fidelity assessment for detecting guided and directed verbal cues is valid and feasible for single-site studies, it can become labor intensive, time consuming, and expensive in large, multi-site pragmatic trials. To address this challenge to widespread strategy training implementation, we leveraged natural language processing (NLP) techniques to automate the strategy training fidelity assessment, i.e., to automatically identify guided and directed verbal cues from video recordings of rehabilitation sessions. We developed a rule-based NLP algorithm, a long-short term memory (LSTM) model, and a bidirectional encoder representation from transformers (BERT) model for this task. The best performance was achieved by the BERT model with a 0.8075 F1-score. This BERT model was verified on an external validation dataset collected from a separate major regional health system and achieved an F1 score of 0.8259, which shows that the BERT model generalizes well. The findings from this study hold widespread promise in psychology and rehabilitation intervention research and practice.
△ Less
Submitted 24 January, 2023; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Extraction of Sleep Information from Clinical Notes of Patients with Alzheimer's Disease Using Natural Language Processing
Authors:
Sonish Sivarajkumar,
Thomas Yu CHow Tam,
Haneef Ahamed Mohammad,
Samual Viggiano,
David Oniani,
Shyam Visweswaran,
Yanshan Wang
Abstract:
Alzheimer's Disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information…
▽ More
Alzheimer's Disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients' subjective experience. A gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192,000 de-identified clinical notes of 7,266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based Natural Language Processing (NLP) algorithm, machine learning models, and Large Language Model(LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, nap**, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset. Rule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of Positive Predictive Value (PPV), rule-based NLP algorithm achieved 1.00 for daytime sleepiness and sleep duration, machine learning models: 0.95 and for nap**, 0.86 for bad sleep quality and 0.90 for snoring; and LLAMA2 with finetuning achieved PPV of 0.93 for Night Wakings, 0.89 for sleep problem, and 1.00 for sleep duration. The results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD, but could be extended to general sleep information extraction for other diseases.
△ Less
Submitted 15 March, 2024; v1 submitted 8 March, 2022;
originally announced April 2022.
-
Reproducing BowNet: Learning Representations by Predicting Bags of Visual Words
Authors:
Harry Nguyen,
Stone Yun,
Hisham Mohammad
Abstract:
This work aims to reproduce results from the CVPR 2020 paper by Gidaris et al. Self-supervised learning (SSL) is used to learn feature representations of an image using an unlabeled dataset. This work proposes to use bag-of-words (BoW) deep feature descriptors as a self-supervised learning target to learn robust, deep representations. BowNet is trained to reconstruct the histogram of visual words…
▽ More
This work aims to reproduce results from the CVPR 2020 paper by Gidaris et al. Self-supervised learning (SSL) is used to learn feature representations of an image using an unlabeled dataset. This work proposes to use bag-of-words (BoW) deep feature descriptors as a self-supervised learning target to learn robust, deep representations. BowNet is trained to reconstruct the histogram of visual words (ie. the deep BoW descriptor) of a reference image when presented a perturbed version of the image as input. Thus, this method aims to learn perturbation-invariant and context-aware image features that can be useful for few-shot tasks or supervised downstream tasks. In the paper, the author describes BowNet as a network consisting of a convolutional feature extractor $Φ(\cdot)$ and a Dense-softmax layer $Ω(\cdot)$ trained to predict BoW features from images. After BoW training, the features of $Φ$ are used in downstream tasks. For this challenge we were trying to build and train a network that could reproduce the CIFAR-100 accuracy improvements reported in the original paper. However, we were unsuccessful in reproducing an accuracy improvement comparable to what the authors mentioned. This could be for a variety of factors and we believe that time constraints were the primary bottleneck.
△ Less
Submitted 14 January, 2022; v1 submitted 10 January, 2022;
originally announced January 2022.
-
A New Localized Network Based Routing Model in Computer and Communication networks
Authors:
Abdulbaset H. Mohammad
Abstract:
In view of the fact that routing algorithms are network layer entities and the varying performance of any routing algorithm depends on the underlying networks. Localized routing algorithms avoid the problems associated with the maintenance of global network state by using statistics of flow blocking probabilities. We developed a new network parameter that can be used to predict which network topol…
▽ More
In view of the fact that routing algorithms are network layer entities and the varying performance of any routing algorithm depends on the underlying networks. Localized routing algorithms avoid the problems associated with the maintenance of global network state by using statistics of flow blocking probabilities. We developed a new network parameter that can be used to predict which network topology gives better performance on the quality of localized QoS routing algorithms. Using this parameter we explore a simple model that can be rewired to introduce increasing the performance. We find that this model have small characteristic path length. Simulations of random and complex networks used to show that the performance is significantly affected by the level of connectivity.
△ Less
Submitted 17 March, 2011;
originally announced March 2011.