Search | arXiv e-print repository

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2308.02041 [pdf]

Regulating AI: Applying insights from behavioural economics and psychology to the application of article 5 of the EU AI Act

Authors: Huixin Zhong, Eamonn O'Neill, Janina A. Hoffmann

Abstract: Article 5 of the European Union's Artificial Intelligence Act is intended to regulate AI use to prevent potentially harmful consequences. Nevertheless, applying this legislation practically is likely to be challenging because of ambiguously used terminologies and because it fails to specify which manipulation techniques may be invoked by AI, potentially leading to significant harm. This paper aims… ▽ More Article 5 of the European Union's Artificial Intelligence Act is intended to regulate AI use to prevent potentially harmful consequences. Nevertheless, applying this legislation practically is likely to be challenging because of ambiguously used terminologies and because it fails to specify which manipulation techniques may be invoked by AI, potentially leading to significant harm. This paper aims to bridge this gap by defining key terms and demonstrating how AI may invoke these techniques, drawing from insights in psychology and behavioural economics. First, this paper provides definitions of the terms "subliminal techniques", "manipulative techniques" and "deceptive techniques". Secondly, we identified from the literature in cognitive psychology and behavioural economics three subliminal and five manipulative techniques and exemplify how AI might implement these techniques to manipulate users in real-world case scenarios. These illustrations may serve as a practical guide for stakeholders to detect cases of AI manipulation and consequently devise preventive measures. Article 5 has also been criticised for offering inadequate protection. We critically assess the protection offered by Article 5, proposing specific revisions to paragraph 1, points (a) and (b) of Article 5 to increase its protective effectiveness. △ Less

Submitted 25 February, 2024; v1 submitted 24 July, 2023; originally announced August 2023.

Comments: This paper was accepted for publication by AAAI 2024 paper on December of 2023

arXiv:2306.10854 [pdf, other]

Performance of data-driven inner speech decoding with same-task EEG-fMRI data fusion and bimodal models

Authors: Holly Wilson, Scott Wellington, Foteini Simistira Liwicki, Vibha Gupta, Rajkumar Saini, Kanjar De, Nosheen Abid, Sumit Rakesh, Johan Eriksson, Oliver Watts, Xi Chen, Mohammad Golbabaee, Michael J. Proulx, Marcus Liwicki, Eamonn O'Neill, Benjamin Metcalfe

Abstract: Decoding inner speech from the brain signal via hybridisation of fMRI and EEG data is explored to investigate the performance benefits over unimodal models. Two different bimodal fusion approaches are examined: concatenation of probability vectors output from unimodal fMRI and EEG machine learning models, and data fusion with feature engineering. Same task inner speech data are recorded from four… ▽ More Decoding inner speech from the brain signal via hybridisation of fMRI and EEG data is explored to investigate the performance benefits over unimodal models. Two different bimodal fusion approaches are examined: concatenation of probability vectors output from unimodal fMRI and EEG machine learning models, and data fusion with feature engineering. Same task inner speech data are recorded from four participants, and different processing strategies are compared and contrasted to previously-employed hybridisation methods. Data across participants are discovered to encode different underlying structures, which results in varying decoding performances between subject-dependent fusion models. Decoding performance is demonstrated as improved when pursuing bimodal fMRI-EEG fusion strategies, if the data show underlying structure. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2305.07389 [pdf, other]

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

Authors: Emma O'Neill, Julie Carson-Berndsen

Abstract: Automatic Speech Recognition (ASR) systems exhibit the best performance on speech that is similar to that on which it was trained. As such, underrepresented varieties including regional dialects, minority-speakers, and low-resource languages, see much higher word error rates (WERs) than those varieties seen as 'prestigious', 'mainstream', or 'standard'. This can act as a barrier to incorporating A… ▽ More Automatic Speech Recognition (ASR) systems exhibit the best performance on speech that is similar to that on which it was trained. As such, underrepresented varieties including regional dialects, minority-speakers, and low-resource languages, see much higher word error rates (WERs) than those varieties seen as 'prestigious', 'mainstream', or 'standard'. This can act as a barrier to incorporating ASR technology into the annotation process for large-scale linguistic research since the manual correction of the erroneous automated transcripts can be just as time and resource consuming as manual transcriptions. A deeper understanding of the behaviour of an ASR system is thus beneficial from a speech technology standpoint, in terms of improving ASR accuracy, and from an annotation standpoint, where knowing the likely errors made by an ASR system can aid in this manual correction. This work demonstrates a method of probing an ASR system to discover how it handles phonetic variation across a number of L2 Englishes. Specifically, how particular phonetic realisations which were rare or absent in the system's training data can lead to phoneme level misrecognitions and contribute to higher WERs. It is demonstrated that the behaviour of the ASR is systematic and consistent across speakers with similar spoken varieties (in this case the same L1) and phoneme substitution errors are typically in agreement with human annotators. By identifying problematic productions specific weaknesses can be addressed by sourcing such realisations for training and fine-tuning thus making the system more robust to pronunciation variation. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:1810.05313 [pdf, ps, other]

doi 10.1016/j.cam.2018.10.019

Xorshift1024*, Xorshift1024+, Xorshift128+ and Xoroshiro128+ Fail Statistical Tests for Linearity

Authors: Daniel Lemire, Melissa E. O'Neill

Abstract: L'Ecuyer & Simard's Big Crush statistical test suite has revealed statistical flaws in many popular random number generators including Marsaglia's Xorshift generators. Vigna recently proposed some 64-bit variations on the Xorshift scheme that are further scrambled (i.e., Xorshift1024*, Xorshift1024+, Xorshift128+, Xoroshiro128+). Unlike their unscrambled counterparts, they pass Big Crush when inte… ▽ More L'Ecuyer & Simard's Big Crush statistical test suite has revealed statistical flaws in many popular random number generators including Marsaglia's Xorshift generators. Vigna recently proposed some 64-bit variations on the Xorshift scheme that are further scrambled (i.e., Xorshift1024*, Xorshift1024+, Xorshift128+, Xoroshiro128+). Unlike their unscrambled counterparts, they pass Big Crush when interleaving blocks of 32 bits for each 64-bit word (most significant, least significant, most significant, least significant, etc.). We report that these scrambled generators systematically fail Big Crush---specifically the linear-complexity and matrix-rank tests that detect linearity---when taking the 32 lowest-order bits in reverse order from each 64-bit word. △ Less

Submitted 25 October, 2018; v1 submitted 11 October, 2018; originally announced October 2018.

Journal ref: Computational and Applied Mathematics 350, 2019

arXiv:1603.08103 [pdf]

Towards the Design of Effective Freehand Gestural Interaction for Interactive TV

Authors: Gang Ren, Wenbin Li, Eamonn O'Neill

Abstract: As interactive devices become pervasive, people are beginning to looking for more advanced interaction with televisions in the living room. Interactive television has the potential to offer a very engaging experience. But most common user tasks are still challenging with such systems, such as menu selection or text input. And little work has been done on understanding and sup-porting the effective… ▽ More As interactive devices become pervasive, people are beginning to looking for more advanced interaction with televisions in the living room. Interactive television has the potential to offer a very engaging experience. But most common user tasks are still challenging with such systems, such as menu selection or text input. And little work has been done on understanding and sup-porting the effective design of freehand interaction with an TV in the living room. In this paper, we perform two studies investi-gating freehand gestural interaction with a consumer level sensor, which is suitable for TV scenarios. In the first study, we inves-tigate a range of design factors for tiled layout menu selection, including wearable feedback, push gesture depth, target size and position in motor space. The results show that tactile and audio feedback have no significant effect on performance and prefer-ence, and these results inform potential designs for high selection performance. In the second study, we investigate a common TV user task of text input using freehand gesture. We design and evaluate two virtual keyboard layouts and three freehand selec-tion methods. Results show that ease of use and error tolerance can be both achieved using a text entry method utilizing a dual circle layout and an expanding target selection technique. Finally, we propose design guidelines for effective, usable and com-fortable freehand gestural interaction for interactive TV based on the findings. △ Less

Submitted 26 March, 2016; originally announced March 2016.

Comments: Preprint version of our paper accepted by Journal of Intelligent and Fuzzy Systems

arXiv:1502.05094 [pdf, other]

Observationally Cooperative Multithreading

Authors: Christopher A. Stone, Melissa E. O'Neill, Sonja A. Bohr, Adam M. Cozzette, M. Joe DeBlasio, Julia Matsieva, Stuart A. Pernsteiner, Ari D. Schumer

Abstract: Despite widespread interest in multicore computing, concur- rency models in mainstream languages often lead to subtle, error-prone code. Observationally Cooperative Multithreading (OCM) is a new approach to shared-memory parallelism. Programmers write code using the well-understood cooperative (i.e., nonpreemptive) multithreading model for uniprocessors. OCM then allows threads to run in paralle… ▽ More Despite widespread interest in multicore computing, concur- rency models in mainstream languages often lead to subtle, error-prone code. Observationally Cooperative Multithreading (OCM) is a new approach to shared-memory parallelism. Programmers write code using the well-understood cooperative (i.e., nonpreemptive) multithreading model for uniprocessors. OCM then allows threads to run in parallel, so long as results remain consistent with the cooperative model. Programmers benefit because they can reason largely sequentially. Remaining interthread interactions are far less chaotic than in other models, permitting easier reasoning and debugging. Programmers can also defer the choice of concurrency-control mechanism (e.g., locks or transactions) until after they have written their programs, at which point they can compare concurrency-control strategies and choose the one that offers the best performance. Implementers and researchers also benefit from the agnostic nature of OCM -- it provides a level of abstraction to investigate, compare, and combine a variety of interesting concurrency-control techniques. △ Less

Submitted 17 February, 2015; originally announced February 2015.

ACM Class: D.1.3; D.3.2

arXiv:0804.3103 [pdf]

Size matters: performance declines if your pixels are too big or too small

Authors: Vassilis Kostakos, Eamonn O'Neill

Abstract: We present a conceptual model that describes the effect of pixel size on target acquisition. We demonstrate the use of our conceptual model by applying it to predict and explain the results of an experiment to evaluate users' performance in a target acquisition task involving three distinct display sizes: standard desktop, small and large displays. The results indicate that users are fastest on… ▽ More We present a conceptual model that describes the effect of pixel size on target acquisition. We demonstrate the use of our conceptual model by applying it to predict and explain the results of an experiment to evaluate users' performance in a target acquisition task involving three distinct display sizes: standard desktop, small and large displays. The results indicate that users are fastest on standard desktop displays, undershoots are the most common error on small displays and overshoots are the most common error on large displays. We propose heuristics to maintain usability when changing displays. Finally, we contribute to the growing body of evidence that amplitude does affect performance in a display-based pointing task. △ Less

Submitted 18 April, 2008; originally announced April 2008.

Comments: 10 pages, 13 figures, 7 tables

arXiv:0709.0223 [pdf]

doi 10.1145/1721831.1721833

Brief encounter networks

Authors: Vassilis Kostakos, Eamonn O'Neill, Alan Penn

Abstract: Many complex human and natural phenomena can usefully be represented as networks describing the relationships between individuals. While these relationships are typically intermittent, previous research has used network representations that aggregate the relationships at discrete intervals. However, such an aggregation discards important temporal information, thus inhibiting our understanding of… ▽ More Many complex human and natural phenomena can usefully be represented as networks describing the relationships between individuals. While these relationships are typically intermittent, previous research has used network representations that aggregate the relationships at discrete intervals. However, such an aggregation discards important temporal information, thus inhibiting our understanding of the networks dynamic behaviour and evolution. We have recorded patterns of human urban encounter using Bluetooth technology thus retaining the temporal properties of this network. Here we show how this temporal information influences the structural properties of the network. We show that the temporal properties of human urban encounter are scale-free, leading to an overwhelming proportion of brief encounters between individuals. While previous research has shown preferential attachment to result in scale-free connectivity in aggregated network data, we found that scale-free connectivity results from the temporal properties of the network. In addition, we show that brief encounters act as weak social ties in the diffusion of non-expiring information, yet persistent encounters provide the means for sustaining time-expiring information through a network. △ Less

Submitted 3 September, 2007; originally announced September 2007.

Comments: 8 pages, 6 figures

Journal ref: ACM Transactions on Computer Human Interaction, 17(1):1-38, 2010

arXiv:cs/0404017 [pdf, ps, other]

doi 10.1117/12.548001

Exploring tradeoffs in pleiotropy and redundancy using evolutionary computing

Authors: Matthew J. Berryman, Wei-Li Khoo, Hiep Nguyen, Erin O'Neill, Andrew Allison, Derek Abbott

Abstract: Evolutionary computation algorithms are increasingly being used to solve optimization problems as they have many advantages over traditional optimization algorithms. In this paper we use evolutionary computation to study the trade-off between pleiotropy and redundancy in a client-server based network. Pleiotropy is a term used to describe components that perform multiple tasks, while redundancy… ▽ More Evolutionary computation algorithms are increasingly being used to solve optimization problems as they have many advantages over traditional optimization algorithms. In this paper we use evolutionary computation to study the trade-off between pleiotropy and redundancy in a client-server based network. Pleiotropy is a term used to describe components that perform multiple tasks, while redundancy refers to multiple components performing one same task. Pleiotropy reduces cost but lacks robustness, while redundancy increases network reliability but is more costly, as together, pleiotropy and redundancy build flexibility and robustness into systems. Therefore it is desirable to have a network that contains a balance between pleiotropy and redundancy. We explore how factors such as link failure probability, repair rates, and the size of the network influence the design choices that we explore using genetic algorithms. △ Less

Submitted 7 April, 2004; originally announced April 2004.

Comments: 10 pages, 6 figures

ACM Class: G.1.6; C.2.1

Journal ref: Proc. SPIE 5275, BioMEMS and Nanotechnology, Ed. Dan V. Nicolau, Perth, Australia, Dec. 2003, pp49-58

Showing 1–10 of 10 results for author: O'Neill, E