-
Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization
Authors:
Mert Esencan,
Tarun Advaith Kumar,
Ata Akbari Asanjan,
P. Aaron Lott,
Masoud Mohseni,
Can Unlu,
Davide Venturelli,
Alan Ho
Abstract:
Recent Large Language Models (LLMs) have demonstrated impressive capabilities at tasks that require human intelligence and are a significant step towards human-like artificial intelligence (AI). Yet the performance of LLMs at reasoning tasks have been subpar and the reasoning capability of LLMs is a matter of significant debate. While it has been shown that the choice of the prompting technique to…
▽ More
Recent Large Language Models (LLMs) have demonstrated impressive capabilities at tasks that require human intelligence and are a significant step towards human-like artificial intelligence (AI). Yet the performance of LLMs at reasoning tasks have been subpar and the reasoning capability of LLMs is a matter of significant debate. While it has been shown that the choice of the prompting technique to the LLM can alter its performance on a multitude of tasks, including reasoning, the best performing techniques require human-made prompts with the knowledge of the tasks at hand. We introduce a framework for what we call Combinatorial Reasoning (CR), a fully-automated prompting method, where reasons are sampled from an LLM pipeline and mapped into a Quadratic Unconstrained Binary Optimization (QUBO) problem. The framework investigates whether QUBO solutions can be profitably used to select a useful subset of the reasons to construct a Chain-of-Thought style prompt. We explore the acceleration of CR with specialized solvers. We also investigate the performance of simpler zero-shot strategies such as linear majority rule or random selection of reasons. Our preliminary study indicates that coupling a combinatorial solver to generative AI pipelines is an interesting avenue for AI reasoning and elucidates design principles for future CR methods.
△ Less
Submitted 19 June, 2024;
originally announced July 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Doped Graphene Quantum Dots UV-Vis Absorption Spectrum: A high-throughput TDDFT study
Authors:
Şener Özönder,
Caner Ünlü,
Cihat Güleryüz,
Levent Trabzon
Abstract:
We report on time-dependent density functional theory (TDDFT) calculations of the excited states of 63 different graphene quantum dots (GQDs) in square shape with side lengths 1 nm, 1.5 nm and 2 nm. We investigate the systematics and trends in the UV-Vis absorption spectra of these GQDs, which are doped with elements B, N, O, S and P at dopant percentages 1.5%, 3%, 5% and 7%. The results show how…
▽ More
We report on time-dependent density functional theory (TDDFT) calculations of the excited states of 63 different graphene quantum dots (GQDs) in square shape with side lengths 1 nm, 1.5 nm and 2 nm. We investigate the systematics and trends in the UV-Vis absorption spectra of these GQDs, which are doped with elements B, N, O, S and P at dopant percentages 1.5%, 3%, 5% and 7%. The results show how the peaks in the UV and visible parts of the spectrum as well as the total absorption evolve in the chemical parameter space along the coordinates of size, dopant type and dopant percentage. The absorption spectra calculated here can be used to obtain particular GQD mixture proportions that would yield a desired absorption profile such as flat absorption across the whole visible spectrum or one that is locally peaked around a chosen wavelength.
△ Less
Submitted 4 June, 2022;
originally announced June 2022.
-
Amphibious Transport of Fluids and Solids by Soft Magnetic Carpets
Authors:
Ahmet F. Demirörs,
Sümeyye Aykut,
Sophia Ganzeboom,
Yuki Meier,
Robert Hardeman,
Joost de Graaf,
Arnold J. T. M. Mathijssen,
Erik Poloni,
Julia A. Carpenter,
Caner Unlu,
Daniel Zenhausern
Abstract:
One of the major challenges in modern robotics is controlling micromanipulation by active and adaptive materials. In the respiratory system, such actuation enables pathogen clearance by means of motile cilia. While various types of artificial cilia have been engineered recently, they often involve complex manufacturing protocols and focus on transporting liquids only. Here, we create soft magnetic…
▽ More
One of the major challenges in modern robotics is controlling micromanipulation by active and adaptive materials. In the respiratory system, such actuation enables pathogen clearance by means of motile cilia. While various types of artificial cilia have been engineered recently, they often involve complex manufacturing protocols and focus on transporting liquids only. Here, we create soft magnetic carpets via an easy self-assembly route based on the Rosensweig instability. These carpets can transport liquids but also solid objects that are larger and heavier than the artificial cilia, using a crowd-surfing effect. This amphibious transportation is locally and reconfigurably tuneable by simple micromagnets or advanced programmable magnetic fields with a high degree of spatial resolution. We identify and model two surprising cargo reversal effects due to collective ciliary motion and non-trivial elastohydrodynamics. While our active carpets are generally applicable to integrated control systems for transport, mixing and sorting, these effects could also be exploited for microfluidic viscosimetry and elastometry.
△ Less
Submitted 22 August, 2021;
originally announced August 2021.
-
Role of Metal Centers in Tuning the Electronic Properties of Graphene-Based Conductive Interfaces
Authors:
Silvio Osella,
Małgorzata Kiliszek,
Ersan Harputlu,
Cumhur G. Unlu,
Kasim Ocakoglu,
Bartosz Trzaskowski,
Joanna Kargul
Abstract:
A major bottleneck in the fabrication of efficient bio-organic nanoelectronic devices resides in the strong charge recombination that is present at the different interfaces forming the complex system. An efficient way to overcome this bottleneck is to add a self-assembled monolayer (SAM) of molecules between the biological material and the electrode that promotes an efficient direct electron trans…
▽ More
A major bottleneck in the fabrication of efficient bio-organic nanoelectronic devices resides in the strong charge recombination that is present at the different interfaces forming the complex system. An efficient way to overcome this bottleneck is to add a self-assembled monolayer (SAM) of molecules between the biological material and the electrode that promotes an efficient direct electron transfer whilst minimising wasteful processes of charge recombination. In this work, the presence of a pyrene-nitrilotriacetic acid layer carrying different metal centers as SAM physisorbed on graphene is fully described by mean of electrochemical analysis, field emission scanning electron microscopy, photoelectrochemical characterisation and theoretical calculations. Our multidisciplinary study reveals that the metal center holds the key role for the efficient electron transfer at the interface. While Ni2+ is responsible for an electron transfer from SAM to graphene, Co2+ and Cu2+ force an opposite transfer, from graphene to SAM. Moreover, since Cu2+ inhibits the electron transfer due to a strong charge recombination, Co2+ seems the transition metal of choice for the efficient electron transfer.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.