Search | arXiv e-print repository

Beyond words and actions: Exploring Multimodal Analytics and Collaboration in the Digital Age

Authors: Diego Miranda, Rene Noel, Jaime Godoy, Carlos Escobedo, Cristian Cechinel, Roberto Munoz

Abstract: This article explores Multimodal Analytics' use in assessing communication within agile software development, particularly through planning poker, to understand collaborative behavior. Multimodal Analytics examines verbal, paraverbal, and non-verbal communication, crucial for effective collaboration in software engineering, which demands efficient communication, cooperation, and coordination. The… ▽ More This article explores Multimodal Analytics' use in assessing communication within agile software development, particularly through planning poker, to understand collaborative behavior. Multimodal Analytics examines verbal, paraverbal, and non-verbal communication, crucial for effective collaboration in software engineering, which demands efficient communication, cooperation, and coordination. The study focuses on how planning poker influences speaking time and attention among team members by utilizing advanced audiovisual data analysis technologies. Results indicate that while planning poker doesn't significantly change total speaking or attention time, it leads to a more equitable speaking time distribution, highlighting its benefit in enhancing equitable team participation. These findings emphasize planning poker's role in improving software team collaboration and suggest multimodal analytics' potential to explore new aspects of team communication. This research contributes to better understanding coordination techniques' impact in software development and team education, proposing future investigations into optimizing team collaboration and performance through alternative coordination techniques and multimodal analysis across different collaborative settings. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2211.05100 [pdf, other]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License. △ Less

Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2209.13547 [pdf, other]

Interactivism in Spoken Dialogue Systems

Authors: T. Rodríguez Muñoz, Emily Y. J. Ip, G. Huang, R. K. Moore

Abstract: The interactivism model introduces a dynamic approach to language, communication and cognition. In this work, we explore this fundamental theory in the context of dialogue modelling for spoken dialogue systems (SDS). To extend such a theoretical framework, we present a set of design principles which adhere to central psycholinguistic and communication theories to achieve interactivism in SDS. From… ▽ More The interactivism model introduces a dynamic approach to language, communication and cognition. In this work, we explore this fundamental theory in the context of dialogue modelling for spoken dialogue systems (SDS). To extend such a theoretical framework, we present a set of design principles which adhere to central psycholinguistic and communication theories to achieve interactivism in SDS. From these, key ideas are linked to constitute the basis of our proposed design principles. △ Less

Submitted 28 September, 2022; v1 submitted 27 September, 2022; originally announced September 2022.

MSC Class: H.1.2; H.5.2; I.2.11; J.4

Journal ref: In the Proceedings of the 26th Workshop on the Semantics and Pragmatics of Dialogue (SemDial 2022), August 22-24 2022, Dublin, pg 263-265

arXiv:2110.13242 [pdf, other]

2D Grid Map Generation for Deep-Learning-based Navigation Approaches

Authors: Gabriel O. Flores-Aquino, Jheison Duvier Díaz Ortega, Ricardo Yahir Almazan Arvizu, Raúl López Muñoz, O. Octavio Gutierrez-Frias, J. Irving Vasquez-Gomez

Abstract: In the last decade, autonomous navigation for roboticshas been leveraged by deep learning and other approachesbased on machine learning. These approaches have demon-strated significant advantages in robotics performance. Butthey have the disadvantage that they require a lot of data toinfer knowledge. In this paper, we present an algorithm forbuilding 2D maps with attributes that make them useful f… ▽ More In the last decade, autonomous navigation for roboticshas been leveraged by deep learning and other approachesbased on machine learning. These approaches have demon-strated significant advantages in robotics performance. Butthey have the disadvantage that they require a lot of data toinfer knowledge. In this paper, we present an algorithm forbuilding 2D maps with attributes that make them useful fortraining and testing machine-learning-based approaches.The maps are based on dungeons environments where sev-eral random rooms are built and then those rooms are con-nected. In addition, we provide a dataset with 10,000 mapsproduced by the proposed algorithm and a description withextensive information for algorithm evaluation. Such infor-mation includes validation of path existence, the best path,distances, among other attributes. We believe that thesemaps and their related information can be very useful forrobotics enthusiasts and researchers who want to test deeplearning approaches. The dataset is available athttps://github.com/gbriel21/map2D_dataSet.git △ Less

Submitted 4 December, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: 6 pages, 4 figures, conference, dataset

arXiv:1803.07328 [pdf]

doi 10.5281/zenodo.398834

End-to-end 5G services via an SDN/NFV-based multi-tenant network and cloud testbed

Authors: Raul Muñoz, Josep Mangues-Bafalluy, Nikolaos Bartzoudis, Ricard Vilalta, Ricardo Martínez, Ramon Casellas, Nicola Baldo, José Núñez-Martínez, Manuel Requena-Esteso, Oriol Font-Bach, Marco Miozzo, Pol Henarejos, Ana Pérez-Neira, Miquel Payaró

Abstract: 5G has a main requirement of highly flexible, ultralow latency and ultra-high bandwidth virtualized infrastructure in order to deliver end-to-end services. This requirement can be met by efficiently integrating all network segments (radio access, aggregation and core) with heterogeneous wireless and optical technologies (5G, mmWave, LTE/LTE-A, Wi-Fi, Ethernet, MPLS, WDM, software-defined optical t… ▽ More 5G has a main requirement of highly flexible, ultralow latency and ultra-high bandwidth virtualized infrastructure in order to deliver end-to-end services. This requirement can be met by efficiently integrating all network segments (radio access, aggregation and core) with heterogeneous wireless and optical technologies (5G, mmWave, LTE/LTE-A, Wi-Fi, Ethernet, MPLS, WDM, software-defined optical transmission, etc.), and massive computing and storage cloud services (offered in edge/core data centers). This paper introduces the preliminary architecture aiming at integrating three consolidated and standalone experimental infrastructures at CTTC, in order to deploy the required end-to-end top-to-bottom converged infrastructure pointed out above for testing and develo** advanced 5G services. △ Less

Submitted 20 March, 2018; originally announced March 2018.

arXiv:1803.07310 [pdf, other]

doi 10.1109/MVT.2015.2508320

The CTTC 5G end-to-end experimental platform: Integrating heterogeneous wireless/optical networks, distributed cloud, and IoT devices

Authors: Raul Muñóz, Josep Mangues, Ricard Vilalta, Christos Verikoukis, Jesús Alonso-Zarate, Nikolaos Bartzoudis, Apostolos Georgiadis, Miquel Payaró, Ana Pérez-Neira, Ramon Casellas, Ricardo Martínez, José Núñez-Martínez, Manuel Requena-Esteso, David Pubill, Oriol Font-Bach, Pol Henarejos, Jordi Serra, Francisco Vazquez-Gallego

Abstract: The Internet of Things (IoT) will facilitate a wide variety of applications in different domains, such as smart cities, smart grids, industrial automation (Industry 4.0), smart driving, assistance of the elderly, and home automation. Billions of heterogeneous smart devices with different application requirements will be connected to the networks and will generate huge aggregated volumes of data th… ▽ More The Internet of Things (IoT) will facilitate a wide variety of applications in different domains, such as smart cities, smart grids, industrial automation (Industry 4.0), smart driving, assistance of the elderly, and home automation. Billions of heterogeneous smart devices with different application requirements will be connected to the networks and will generate huge aggregated volumes of data that will be processed in distributed cloud infrastructures. On the other hand, there is also a general trend to deploy functions as software (SW) instances in cloud infrastructures [e.g., network function virtualization (NFV) or mobile edge computing (MEC)]. Thus, the next generation of mobile networks, the fifth-generation (5G), will need not only to develop new radio interfaces or waveforms to cope with the expected traffic growth but also to integrate heterogeneous networks from end to end (E2E) with distributed cloud resources to deliver E2E IoT and mobile services. This article presents the E2E 5G platform that is being developed by the Centre Tecnològic de Telecomunicacions de Catalunya (CTTC), the first known platform capable of reproducing such an ambitious scenario. △ Less

Submitted 20 March, 2018; originally announced March 2018.

arXiv:1508.06226 [pdf, ps, other]

doi 10.4279/PIP.060002

Study of the characteristic parameters of the normal voices of Argentinian speakers

Authors: E. V. Bonzi, G. B. Grad, A. M. Maggi, M. R. Muñóz

Abstract: The voice laboratory permits to study the human voices using a method that is objective and noninvasive. In this work, we have studied the parameters of the human voice such as pitch, formant, jitter, shimmer and harmonic-noise ratio of a group of young people. This statistical information of parameters is obtained from Argentinian speakers. The voice laboratory permits to study the human voices using a method that is objective and noninvasive. In this work, we have studied the parameters of the human voice such as pitch, formant, jitter, shimmer and harmonic-noise ratio of a group of young people. This statistical information of parameters is obtained from Argentinian speakers. △ Less

Submitted 18 December, 2014; originally announced August 2015.

Comments: 5 pages, 6 figures

Journal ref: Papers in Physics 6, 060002 (2014)

arXiv:1401.3482 [pdf]

doi 10.1613/jair.2805

Enhancing QA Systems with Complex Temporal Question Processing Capabilities

Authors: Estela Saquete, Jose Luis Vicedo, Patricio Martínez-Barco, Rafael Muñoz, Hector Llorens

Abstract: This paper presents a multilayered architecture that enhances the capabilities of current QA systems and allows different types of complex questions or queries to be processed. The answers to these questions need to be gathered from factual information scattered throughout different documents. Specifically, we designed a specialized layer to process the different types of temporal questions. Compl… ▽ More This paper presents a multilayered architecture that enhances the capabilities of current QA systems and allows different types of complex questions or queries to be processed. The answers to these questions need to be gathered from factual information scattered throughout different documents. Specifically, we designed a specialized layer to process the different types of temporal questions. Complex temporal questions are first decomposed into simple questions, according to the temporal relations expressed in the original question. In the same way, the answers to the resulting simple questions are recomposed, fulfilling the temporal restrictions of the original complex question. A novel aspect of this approach resides in the decomposition which uses a minimal quantity of resources, with the final aim of obtaining a portable platform that is easily extensible to other languages. In this paper we also present a methodology for evaluation of the decomposition of the questions as well as the ability of the implemented temporal layer to perform at a multilingual level. The temporal layer was first performed for English, then evaluated and compared with: a) a general purpose QA system (F-measure 65.47% for QA plus English temporal layer vs. 38.01% for the general QA system), and b) a well-known QA system. Much better results were obtained for temporal questions with the multilayered system. This system was therefore extended to Spanish and very good results were again obtained in the evaluation (F-measure 40.36% for QA plus Spanish temporal layer vs. 22.94% for the general QA system). △ Less

Submitted 15 January, 2014; originally announced January 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 35, pages 775-811, 2009

arXiv:1111.6121 [pdf]

Virtual Worlds as a Support to Engineering Teaching

Authors: Roberto Muñoz, Marta Barría, Cristian Rusu

Abstract: Virtual Worlds (VWs) are an emerging technology used by a growing number of educational institutions around the world. It is an environment, a way of learning and an educational tool that allows different levels of online interaction. In the course "Programming I", of the career Informatics Engineering at Universidad de Valparaíso, we conducted a pilot experience with the VW of Second Life, in ord… ▽ More Virtual Worlds (VWs) are an emerging technology used by a growing number of educational institutions around the world. It is an environment, a way of learning and an educational tool that allows different levels of online interaction. In the course "Programming I", of the career Informatics Engineering at Universidad de Valparaíso, we conducted a pilot experience with the VW of Second Life, in order to evaluate the potential of using VWs in the teaching practice. △ Less

Submitted 25 November, 2011; originally announced November 2011.

Comments: XIII Chilean Congress on Higher Education in Computer Science, CCESC'2011

arXiv:1107.0234 [pdf, other]

Unbounded Contention Resolution in Multiple-Access Channels

Authors: Antonio Fernández Anta, Miguel A. Mosteiro, Jorge Ramón Muñoz

Abstract: A frequent problem in settings where a unique resource must be shared among users is how to resolve the contention that arises when all of them must use it, but the resource allows only for one user each time. The application of efficient solutions for this problem spans a myriad of settings such as radio communication networks or databases. For the case where the number of users is unknown, recen… ▽ More A frequent problem in settings where a unique resource must be shared among users is how to resolve the contention that arises when all of them must use it, but the resource allows only for one user each time. The application of efficient solutions for this problem spans a myriad of settings such as radio communication networks or databases. For the case where the number of users is unknown, recent work has yielded fruitful results for local area networks and radio networks, although either a (possibly loose) upper bound on the number of users needs to be known, or the solution is suboptimal, or it is only implicit or embedded in other problems, with bounds proved only asymptotically. In this paper, under the assumption that collision detection or information on the number of contenders is not available, we present a novel protocol for contention resolution in radio networks, and we recreate a protocol previously used for other problems, tailoring the constants for our needs. In contrast with previous work, both protocols are proved to be optimal up to a small constant factor and with high probability for big enough number of contenders. Additionally, the protocols are evaluated and contrasted with the previous work by extensive simulations. The evaluation shows that the complexity bounds obtained by the analysis are rather tight, and that both protocols proposed have small and predictable complexity for many system sizes (unlike previous proposals). △ Less

Submitted 1 July, 2011; originally announced July 2011.

Comments: 21 pages, 1 figure. To appear in DISC 2011

MSC Class: 68Q87 ACM Class: F.2.2

Showing 1–11 of 11 results for author: Muñóz, R