-
Beyond words and actions: Exploring Multimodal Analytics and Collaboration in the Digital Age
Authors:
Diego Miranda,
Rene Noel,
Jaime Godoy,
Carlos Escobedo,
Cristian Cechinel,
Roberto Munoz
Abstract:
This article explores Multimodal Analytics' use in assessing communication within agile software development, particularly through planning poker, to understand collaborative behavior. Multimodal Analytics examines verbal, paraverbal, and non-verbal communication, crucial for effective collaboration in software engineering, which demands efficient communication, cooperation, and coordination. The…
▽ More
This article explores Multimodal Analytics' use in assessing communication within agile software development, particularly through planning poker, to understand collaborative behavior. Multimodal Analytics examines verbal, paraverbal, and non-verbal communication, crucial for effective collaboration in software engineering, which demands efficient communication, cooperation, and coordination. The study focuses on how planning poker influences speaking time and attention among team members by utilizing advanced audiovisual data analysis technologies. Results indicate that while planning poker doesn't significantly change total speaking or attention time, it leads to a more equitable speaking time distribution, highlighting its benefit in enhancing equitable team participation. These findings emphasize planning poker's role in improving software team collaboration and suggest multimodal analytics' potential to explore new aspects of team communication. This research contributes to better understanding coordination techniques' impact in software development and team education, proposing future investigations into optimizing team collaboration and performance through alternative coordination techniques and multimodal analysis across different collaborative settings.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Authors:
BigScience Workshop,
:,
Teven Le Scao,
Angela Fan,
Christopher Akiki,
Ellie Pavlick,
Suzana Ilić,
Daniel Hesslow,
Roman Castagné,
Alexandra Sasha Luccioni,
François Yvon,
Matthias Gallé,
Jonathan Tow,
Alexander M. Rush,
Stella Biderman,
Albert Webson,
Pawan Sasanka Ammanamanchi,
Thomas Wang,
Benoît Sagot,
Niklas Muennighoff,
Albert Villanova del Moral,
Olatunji Ruwase,
Rachel Bawden,
Stas Bekman,
Angelina McMillan-Major
, et al. (369 additional authors not shown)
Abstract:
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access…
▽ More
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
△ Less
Submitted 27 June, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Interactivism in Spoken Dialogue Systems
Authors:
T. Rodríguez Muñoz,
Emily Y. J. Ip,
G. Huang,
R. K. Moore
Abstract:
The interactivism model introduces a dynamic approach to language, communication and cognition. In this work, we explore this fundamental theory in the context of dialogue modelling for spoken dialogue systems (SDS). To extend such a theoretical framework, we present a set of design principles which adhere to central psycholinguistic and communication theories to achieve interactivism in SDS. From…
▽ More
The interactivism model introduces a dynamic approach to language, communication and cognition. In this work, we explore this fundamental theory in the context of dialogue modelling for spoken dialogue systems (SDS). To extend such a theoretical framework, we present a set of design principles which adhere to central psycholinguistic and communication theories to achieve interactivism in SDS. From these, key ideas are linked to constitute the basis of our proposed design principles.
△ Less
Submitted 28 September, 2022; v1 submitted 27 September, 2022;
originally announced September 2022.
-
2D Grid Map Generation for Deep-Learning-based Navigation Approaches
Authors:
Gabriel O. Flores-Aquino,
Jheison Duvier Díaz Ortega,
Ricardo Yahir Almazan Arvizu,
Raúl López Muñoz,
O. Octavio Gutierrez-Frias,
J. Irving Vasquez-Gomez
Abstract:
In the last decade, autonomous navigation for roboticshas been leveraged by deep learning and other approachesbased on machine learning. These approaches have demon-strated significant advantages in robotics performance. Butthey have the disadvantage that they require a lot of data toinfer knowledge. In this paper, we present an algorithm forbuilding 2D maps with attributes that make them useful f…
▽ More
In the last decade, autonomous navigation for roboticshas been leveraged by deep learning and other approachesbased on machine learning. These approaches have demon-strated significant advantages in robotics performance. Butthey have the disadvantage that they require a lot of data toinfer knowledge. In this paper, we present an algorithm forbuilding 2D maps with attributes that make them useful fortraining and testing machine-learning-based approaches.The maps are based on dungeons environments where sev-eral random rooms are built and then those rooms are con-nected. In addition, we provide a dataset with 10,000 mapsproduced by the proposed algorithm and a description withextensive information for algorithm evaluation. Such infor-mation includes validation of path existence, the best path,distances, among other attributes. We believe that thesemaps and their related information can be very useful forrobotics enthusiasts and researchers who want to test deeplearning approaches. The dataset is available athttps://github.com/gbriel21/map2D_dataSet.git
△ Less
Submitted 4 December, 2021; v1 submitted 25 October, 2021;
originally announced October 2021.
-
End-to-end 5G services via an SDN/NFV-based multi-tenant network and cloud testbed
Authors:
Raul Muñoz,
Josep Mangues-Bafalluy,
Nikolaos Bartzoudis,
Ricard Vilalta,
Ricardo Martínez,
Ramon Casellas,
Nicola Baldo,
José Núñez-Martínez,
Manuel Requena-Esteso,
Oriol Font-Bach,
Marco Miozzo,
Pol Henarejos,
Ana Pérez-Neira,
Miquel Payaró
Abstract:
5G has a main requirement of highly flexible, ultralow latency and ultra-high bandwidth virtualized infrastructure in order to deliver end-to-end services. This requirement can be met by efficiently integrating all network segments (radio access, aggregation and core) with heterogeneous wireless and optical technologies (5G, mmWave, LTE/LTE-A, Wi-Fi, Ethernet, MPLS, WDM, software-defined optical t…
▽ More
5G has a main requirement of highly flexible, ultralow latency and ultra-high bandwidth virtualized infrastructure in order to deliver end-to-end services. This requirement can be met by efficiently integrating all network segments (radio access, aggregation and core) with heterogeneous wireless and optical technologies (5G, mmWave, LTE/LTE-A, Wi-Fi, Ethernet, MPLS, WDM, software-defined optical transmission, etc.), and massive computing and storage cloud services (offered in edge/core data centers). This paper introduces the preliminary architecture aiming at integrating three consolidated and standalone experimental infrastructures at CTTC, in order to deploy the required end-to-end top-to-bottom converged infrastructure pointed out above for testing and develo** advanced 5G services.
△ Less
Submitted 20 March, 2018;
originally announced March 2018.
-
The CTTC 5G end-to-end experimental platform: Integrating heterogeneous wireless/optical networks, distributed cloud, and IoT devices
Authors:
Raul Muñóz,
Josep Mangues,
Ricard Vilalta,
Christos Verikoukis,
Jesús Alonso-Zarate,
Nikolaos Bartzoudis,
Apostolos Georgiadis,
Miquel Payaró,
Ana Pérez-Neira,
Ramon Casellas,
Ricardo Martínez,
José Núñez-Martínez,
Manuel Requena-Esteso,
David Pubill,
Oriol Font-Bach,
Pol Henarejos,
Jordi Serra,
Francisco Vazquez-Gallego
Abstract:
The Internet of Things (IoT) will facilitate a wide variety of applications in different domains, such as smart cities, smart grids, industrial automation (Industry 4.0), smart driving, assistance of the elderly, and home automation. Billions of heterogeneous smart devices with different application requirements will be connected to the networks and will generate huge aggregated volumes of data th…
▽ More
The Internet of Things (IoT) will facilitate a wide variety of applications in different domains, such as smart cities, smart grids, industrial automation (Industry 4.0), smart driving, assistance of the elderly, and home automation. Billions of heterogeneous smart devices with different application requirements will be connected to the networks and will generate huge aggregated volumes of data that will be processed in distributed cloud infrastructures. On the other hand, there is also a general trend to deploy functions as software (SW) instances in cloud infrastructures [e.g., network function virtualization (NFV) or mobile edge computing (MEC)]. Thus, the next generation of mobile networks, the fifth-generation (5G), will need not only to develop new radio interfaces or waveforms to cope with the expected traffic growth but also to integrate heterogeneous networks from end to end (E2E) with distributed cloud resources to deliver E2E IoT and mobile services. This article presents the E2E 5G platform that is being developed by the Centre Tecnològic de Telecomunicacions de Catalunya (CTTC), the first known platform capable of reproducing such an ambitious scenario.
△ Less
Submitted 20 March, 2018;
originally announced March 2018.
-
Study of the characteristic parameters of the normal voices of Argentinian speakers
Authors:
E. V. Bonzi,
G. B. Grad,
A. M. Maggi,
M. R. Muñóz
Abstract:
The voice laboratory permits to study the human voices using a method that is objective and noninvasive. In this work, we have studied the parameters of the human voice such as pitch, formant, jitter, shimmer and harmonic-noise ratio of a group of young people. This statistical information of parameters is obtained from Argentinian speakers.
The voice laboratory permits to study the human voices using a method that is objective and noninvasive. In this work, we have studied the parameters of the human voice such as pitch, formant, jitter, shimmer and harmonic-noise ratio of a group of young people. This statistical information of parameters is obtained from Argentinian speakers.
△ Less
Submitted 18 December, 2014;
originally announced August 2015.
-
Enhancing QA Systems with Complex Temporal Question Processing Capabilities
Authors:
Estela Saquete,
Jose Luis Vicedo,
Patricio Martínez-Barco,
Rafael Muñoz,
Hector Llorens
Abstract:
This paper presents a multilayered architecture that enhances the capabilities of current QA systems and allows different types of complex questions or queries to be processed. The answers to these questions need to be gathered from factual information scattered throughout different documents. Specifically, we designed a specialized layer to process the different types of temporal questions. Compl…
▽ More
This paper presents a multilayered architecture that enhances the capabilities of current QA systems and allows different types of complex questions or queries to be processed. The answers to these questions need to be gathered from factual information scattered throughout different documents. Specifically, we designed a specialized layer to process the different types of temporal questions. Complex temporal questions are first decomposed into simple questions, according to the temporal relations expressed in the original question. In the same way, the answers to the resulting simple questions are recomposed, fulfilling the temporal restrictions of the original complex question. A novel aspect of this approach resides in the decomposition which uses a minimal quantity of resources, with the final aim of obtaining a portable platform that is easily extensible to other languages. In this paper we also present a methodology for evaluation of the decomposition of the questions as well as the ability of the implemented temporal layer to perform at a multilingual level. The temporal layer was first performed for English, then evaluated and compared with: a) a general purpose QA system (F-measure 65.47% for QA plus English temporal layer vs. 38.01% for the general QA system), and b) a well-known QA system. Much better results were obtained for temporal questions with the multilayered system. This system was therefore extended to Spanish and very good results were again obtained in the evaluation (F-measure 40.36% for QA plus Spanish temporal layer vs. 22.94% for the general QA system).
△ Less
Submitted 15 January, 2014;
originally announced January 2014.
-
Virtual Worlds as a Support to Engineering Teaching
Authors:
Roberto Muñoz,
Marta Barría,
Cristian Rusu
Abstract:
Virtual Worlds (VWs) are an emerging technology used by a growing number of educational institutions around the world. It is an environment, a way of learning and an educational tool that allows different levels of online interaction. In the course "Programming I", of the career Informatics Engineering at Universidad de Valparaíso, we conducted a pilot experience with the VW of Second Life, in ord…
▽ More
Virtual Worlds (VWs) are an emerging technology used by a growing number of educational institutions around the world. It is an environment, a way of learning and an educational tool that allows different levels of online interaction. In the course "Programming I", of the career Informatics Engineering at Universidad de Valparaíso, we conducted a pilot experience with the VW of Second Life, in order to evaluate the potential of using VWs in the teaching practice.
△ Less
Submitted 25 November, 2011;
originally announced November 2011.
-
Unbounded Contention Resolution in Multiple-Access Channels
Authors:
Antonio Fernández Anta,
Miguel A. Mosteiro,
Jorge Ramón Muñoz
Abstract:
A frequent problem in settings where a unique resource must be shared among users is how to resolve the contention that arises when all of them must use it, but the resource allows only for one user each time. The application of efficient solutions for this problem spans a myriad of settings such as radio communication networks or databases. For the case where the number of users is unknown, recen…
▽ More
A frequent problem in settings where a unique resource must be shared among users is how to resolve the contention that arises when all of them must use it, but the resource allows only for one user each time. The application of efficient solutions for this problem spans a myriad of settings such as radio communication networks or databases. For the case where the number of users is unknown, recent work has yielded fruitful results for local area networks and radio networks, although either a (possibly loose) upper bound on the number of users needs to be known, or the solution is suboptimal, or it is only implicit or embedded in other problems, with bounds proved only asymptotically. In this paper, under the assumption that collision detection or information on the number of contenders is not available, we present a novel protocol for contention resolution in radio networks, and we recreate a protocol previously used for other problems, tailoring the constants for our needs. In contrast with previous work, both protocols are proved to be optimal up to a small constant factor and with high probability for big enough number of contenders. Additionally, the protocols are evaluated and contrasted with the previous work by extensive simulations. The evaluation shows that the complexity bounds obtained by the analysis are rather tight, and that both protocols proposed have small and predictable complexity for many system sizes (unlike previous proposals).
△ Less
Submitted 1 July, 2011;
originally announced July 2011.