-
Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru
Authors:
Zining Wang,
Paul Reisert,
Eric Nichols,
Randy Gomez
Abstract:
Social robots aim to establish long-term bonds with humans through engaging conversation. However, traditional conversational approaches, reliant on scripted interactions, often fall short in maintaining engaging conversations. This paper addresses this limitation by integrating large language models (LLMs) into social robots to achieve more dynamic and expressive conversations. We introduce a ful…
▽ More
Social robots aim to establish long-term bonds with humans through engaging conversation. However, traditional conversational approaches, reliant on scripted interactions, often fall short in maintaining engaging conversations. This paper addresses this limitation by integrating large language models (LLMs) into social robots to achieve more dynamic and expressive conversations. We introduce a fully-automated conversation system that leverages LLMs to generate robot responses with expressive behaviors, congruent with the robot's personality. We incorporate robot behavior with two modalities: 1) a text-to-speech (TTS) engine capable of various delivery styles, and 2) a library of physical actions for the robot. We develop a custom, state-of-the-art emotion recognition model to dynamically select the robot's tone of voice and utilize emojis from LLM output as cues for generating robot actions. A demo of our system is available here. To illuminate design and implementation issues, we conduct a pilot study where volunteers chat with a social robot using our proposed system, and we analyze their feedback, conducting a rigorous error analysis of chat transcripts. Feedback was overwhelmingly positive, with participants commenting on the robot's empathy, helpfulness, naturalness, and entertainment. Most negative feedback was due to automatic speech recognition (ASR) errors which had limited impact on conversations. However, we observed a small class of errors, such as the LLM repeating itself or hallucinating fictitious information and human responses, that have the potential to derail conversations, raising important issues for LLM application.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Develo** Autonomous Robot-Mediated Behavior Coaching Sessions with Haru
Authors:
Matouš Jelínek,
Eric Nichols,
Randy Gomez
Abstract:
This study presents an empirical investigation into the design and impact of autonomous dialogues in human-robot interaction for behavior change coaching. We focus on the use of Haru, a tabletop social robot, and explore the implementation of the Tiny Habits method for fostering positive behavior change. The core of our study lies in develo** a fully autonomous dialogue system that maximizes Har…
▽ More
This study presents an empirical investigation into the design and impact of autonomous dialogues in human-robot interaction for behavior change coaching. We focus on the use of Haru, a tabletop social robot, and explore the implementation of the Tiny Habits method for fostering positive behavior change. The core of our study lies in develo** a fully autonomous dialogue system that maximizes Haru's emotional expressiveness and unique personality. Our methodology involved iterative design and extensive testing of the dialogue system, ensuring it effectively embodied the principles of the Tiny Habits method while also incorporating strategies for trust-raising and trust-dampening. The effectiveness of the final version of the dialogue was evaluated in an experimental study with human participants (N=12). The results indicated a significant improvement in perceptions of Haru's liveliness, interactivity, and neutrality. Additionally, our study contributes to the broader understanding of dialogue design in social robotics, offering practical insights for future developments in the field.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Domination and packing in graphs
Authors:
Renzo Gómez,
Juan Gutiérrez
Abstract:
Given a graph~$G$, the domination number, denoted by~$γ(G)$, is the minimum cardinality of a dominating set in~$G$. Dual to the notion of domination number is the packing number of a graph. A packing of~$G$ is a set of vertices whose pairwise distance is at least three. The packing number~$ρ(G)$ of~$G$ is the maximum cardinality of one such set. Furthermore, the inequality~$ρ(G) \leq γ(G)$ is well…
▽ More
Given a graph~$G$, the domination number, denoted by~$γ(G)$, is the minimum cardinality of a dominating set in~$G$. Dual to the notion of domination number is the packing number of a graph. A packing of~$G$ is a set of vertices whose pairwise distance is at least three. The packing number~$ρ(G)$ of~$G$ is the maximum cardinality of one such set. Furthermore, the inequality~$ρ(G) \leq γ(G)$ is well-known. Henning et al.\ conjectured that~$γ(G) \leq 2ρ(G)+1$ if~$G$ is subcubic. In this paper, we progress towards this conjecture by showing that~${γ(G) \leq \frac{120}{49}ρ(G)}$ if~$G$ is a bipartite cubic graph. We also show that $γ(G) \leq 3ρ(G)$ if~$G$ is a maximal outerplanar graph, and that~$γ(G) \leq 2ρ(G)$ if~$G$ is a biconvex graph. Moreover, in the last case, we show that this upper bound is tight.
△ Less
Submitted 8 February, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
A Study on Social Robot Behavior in Group Conversation
Authors:
Tung Nguyen,
Eric Nichols,
Randy Gomez
Abstract:
Recently, research in human-robot interaction began to consider a robot's influence at the group level. Despite the recent growth in research investigating the effects of robots within groups of people, our overall understanding of what happens when robots are placed within groups or teams of people is still limited. This paper investigates several key problems for social robots that manage conver…
▽ More
Recently, research in human-robot interaction began to consider a robot's influence at the group level. Despite the recent growth in research investigating the effects of robots within groups of people, our overall understanding of what happens when robots are placed within groups or teams of people is still limited. This paper investigates several key problems for social robots that manage conversations in a group setting, where the number of participants is more than two. In a group setting, the conversation dynamics are a lot more complicated than the conventional one-to-one conversation, thus, there are more challenges need to be solved.
△ Less
Submitted 20 December, 2023; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Social Robot Mediator for Multiparty Interaction
Authors:
Manith Adikari,
Angelo Cangelosi,
Randy Gomez
Abstract:
A social robot acting as a 'mediator' can enhance interactions between humans, for example, in fields such as education and healthcare. A particularly promising area of research is the use of a social robot mediator in a multiparty setting, which tends to be the most applicable in real-world scenarios. However, research in social robot mediation for multiparty interactions is still emerging and fa…
▽ More
A social robot acting as a 'mediator' can enhance interactions between humans, for example, in fields such as education and healthcare. A particularly promising area of research is the use of a social robot mediator in a multiparty setting, which tends to be the most applicable in real-world scenarios. However, research in social robot mediation for multiparty interactions is still emerging and faces numerous challenges. This paper provides an overview of social robotics and mediation research by highlighting relevant literature and some of the ongoing problems. The importance of incorporating relevant psychological principles for develo** social robot mediators is also presented. Additionally, the potential of implementing a Theory of Mind in a social robot mediator is explored, given how such a framework could greatly improve mediation by reading the individual and group mental states to interact effectively.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Speech Wikimedia: A 77 Language Multilingual Speech Dataset
Authors:
Rafael Mosquera Gómez,
Julián Eusse,
Juan Ciro,
Daniel Galvez,
Ryan Hileman,
Kurt Bollacker,
David Kanter
Abstract:
The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons. It includes 1780 hours (195 GB) of CC-BY-SA licensed transcribed speech from a diverse set of scenarios and speakers, in 77 different languages. Each audio file has one or more transcriptions in different languages, making this dataset suitable for training speech recogni…
▽ More
The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons. It includes 1780 hours (195 GB) of CC-BY-SA licensed transcribed speech from a diverse set of scenarios and speakers, in 77 different languages. Each audio file has one or more transcriptions in different languages, making this dataset suitable for training speech recognition, speech translation, and machine translation models.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Two to Five Truths in Non-Negative Matrix Factorization
Authors:
John M. Conroy,
Neil P Molino,
Brian Baughman,
Rod Gomez,
Ryan Kaliszewski,
Nicholas A. Lines
Abstract:
In this paper, we explore the role of matrix scaling on a matrix of counts when building a topic model using non-negative matrix factorization. We present a scaling inspired by the normalized Laplacian (NL) for graphs that can greatly improve the quality of a non-negative matrix factorization. The results parallel those in the spectral graph clustering work of \cite{Priebe:2019}, where the authors…
▽ More
In this paper, we explore the role of matrix scaling on a matrix of counts when building a topic model using non-negative matrix factorization. We present a scaling inspired by the normalized Laplacian (NL) for graphs that can greatly improve the quality of a non-negative matrix factorization. The results parallel those in the spectral graph clustering work of \cite{Priebe:2019}, where the authors proved adjacency spectral embedding (ASE) spectral clustering was more likely to discover core-periphery partitions and Laplacian Spectral Embedding (LSE) was more likely to discover affinity partitions. In text analysis non-negative matrix factorization (NMF) is typically used on a matrix of co-occurrence ``contexts'' and ``terms" counts. The matrix scaling inspired by LSE gives significant improvement for text topic models in a variety of datasets. We illustrate the dramatic difference a matrix scalings in NMF can greatly improve the quality of a topic model on three datasets where human annotation is available. Using the adjusted Rand index (ARI), a measure cluster similarity we see an increase of 50\% for Twitter data and over 200\% for a newsgroup dataset versus using counts, which is the analogue of ASE. For clean data, such as those from the Document Understanding Conference, NL gives over 40\% improvement over ASE. We conclude with some analysis of this phenomenon and some connections of this scaling with other matrix scaling methods.
△ Less
Submitted 5 September, 2023; v1 submitted 6 May, 2023;
originally announced May 2023.
-
Towards AutoQML: A Cloud-Based Automated Circuit Architecture Search Framework
Authors:
Raúl Berganza Gómez,
Corey O'Meara,
Giorgio Cortiana,
Christian B. Mendl,
Juan Bernabé-Moreno
Abstract:
The learning process of classical machine learning algorithms is tuned by hyperparameters that need to be customized to best learn and generalize from an input dataset. In recent years, Quantum Machine Learning (QML) has been gaining traction as a possible application of quantum computing which may provide quantum advantage in the future. However, quantum versions of classical machine learning alg…
▽ More
The learning process of classical machine learning algorithms is tuned by hyperparameters that need to be customized to best learn and generalize from an input dataset. In recent years, Quantum Machine Learning (QML) has been gaining traction as a possible application of quantum computing which may provide quantum advantage in the future. However, quantum versions of classical machine learning algorithms introduce a plethora of additional parameters and circuit variations that have their own intricacies in being tuned.
In this work, we take the first steps towards Automated Quantum Machine Learning (AutoQML). We propose a concrete description of the problem, and then develop a classical-quantum hybrid cloud architecture that allows for parallelized hyperparameter exploration and model training.
As an application use-case, we train a quantum Generative Adversarial neural Network (qGAN) to generate energy prices that follow a known historic data distribution. Such a QML model can be used for various applications in the energy economics sector.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
Path eccentricity of graphs
Authors:
Renzo Gómez,
Juan Gutiérrez
Abstract:
Let $G$ be a connected graph. The eccentricity of a path $P$, denoted by ecc$_G(P)$, is the maximum distance from $P$ to any vertex in $G$. In the \textsc{Central path} (CP) problem our aim is to find a path of minimum eccentricity. This problem was introduced by Cockayne et al., in 1981, in the study of different centrality measures on graphs. They showed that CP can be solved in linear time in t…
▽ More
Let $G$ be a connected graph. The eccentricity of a path $P$, denoted by ecc$_G(P)$, is the maximum distance from $P$ to any vertex in $G$. In the \textsc{Central path} (CP) problem our aim is to find a path of minimum eccentricity. This problem was introduced by Cockayne et al., in 1981, in the study of different centrality measures on graphs. They showed that CP can be solved in linear time in trees, but it is known to be NP-hard in many classes of graphs such as chordal bipartite graphs, planar 3-connected graphs, split graphs, etc.
We investigate the path eccentricity of a connected graph~$G$ as a parameter. Let pe$(G)$ denote the value of ecc$_G(P)$ for a central path $P$ of $G$. We obtain tight upper bounds for pe$(G)$ in some graph classes. We show that pe$(G) \leq 1$ on biconvex graphs and that pe$(G) \leq 2$ on bipartite convex graphs. Moreover, we design algorithms that find such a path in linear time. On the other hand, by investigating the longest paths of a graph, we obtain tight upper bounds for pe$(G)$ on general graphs and $k$-connected graphs.
Finally, we study the relation between a central path and a longest path in a graph. We show that on trees, and bipartite permutation graphs, a longest path is also a central path. Furthermore, for superclasses of these graphs, we exhibit counterexamples for this property.
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback
Authors:
Jie Huang,
Rongshun Juan,
Randy Gomez,
Keisuke Nakamura,
Qixin Sha,
Bo He,
Guangliang Li
Abstract:
Deep reinforcement learning (DRL) has achieved great successes in many simulated tasks. The sample inefficiency problem makes applying traditional DRL methods to real-world robots a great challenge. Generative Adversarial Imitation Learning (GAIL) -- a general model-free imitation learning method, allows robots to directly learn policies from expert trajectories in large environments. However, GAI…
▽ More
Deep reinforcement learning (DRL) has achieved great successes in many simulated tasks. The sample inefficiency problem makes applying traditional DRL methods to real-world robots a great challenge. Generative Adversarial Imitation Learning (GAIL) -- a general model-free imitation learning method, allows robots to directly learn policies from expert trajectories in large environments. However, GAIL shares the limitation of other imitation learning methods that they can seldom surpass the performance of demonstrations. In this paper, to address the limit of GAIL, we propose GAN-Based Interactive Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback by combining the advantages of GAIL and interactive reinforcement learning. We tested our proposed method in six physics-based control tasks, ranging from simple low-dimensional control tasks -- Cart Pole and Mountain Car, to difficult high-dimensional tasks -- Inverted Double Pendulum, Lunar Lander, Hopper and HalfCheetah. Our results suggest that with both optimal and suboptimal demonstrations, a GAIRL agent can always learn a more stable policy with optimal or close to optimal performance, while the performance of the GAIL agent is upper bounded by the performance of demonstrations or even worse than it. In addition, our results indicate the reason that GAIRL is superior over GAIL is the complementary effect of demonstrations and human evaluative feedback.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Efficient Competitions and Online Learning with Strategic Forecasters
Authors:
Rafael Frongillo,
Robert Gomez,
Anish Thilagar,
Bo Waggoner
Abstract:
Winner-take-all competitions in forecasting and machine-learning suffer from distorted incentives. Witkowski et al. 2018 identified this problem and proposed ELF, a truthful mechanism to select a winner. We show that, from a pool of $n$ forecasters, ELF requires $Θ(n\log n)$ events or test data points to select a near-optimal forecaster with high probability. We then show that standard online lear…
▽ More
Winner-take-all competitions in forecasting and machine-learning suffer from distorted incentives. Witkowski et al. 2018 identified this problem and proposed ELF, a truthful mechanism to select a winner. We show that, from a pool of $n$ forecasters, ELF requires $Θ(n\log n)$ events or test data points to select a near-optimal forecaster with high probability. We then show that standard online learning algorithms select an $ε$-optimal forecaster using only $O(\log(n) / ε^2)$ events, by way of a strong approximate-truthfulness guarantee. This bound matches the best possible even in the nonstrategic setting. We then apply these mechanisms to obtain the first no-regret guarantee for non-myopic strategic experts.
△ Less
Submitted 10 June, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs
Authors:
Harini Suresh,
Steven R. Gomez,
Kevin K. Nam,
Arvind Satyanarayan
Abstract:
To ensure accountability and mitigate harm, it is critical that diverse stakeholders can interrogate black-box automated systems and find information that is understandable, relevant, and useful to them. In this paper, we eschew prior expertise- and role-based categorizations of interpretability stakeholders in favor of a more granular framework that decouples stakeholders' knowledge from their in…
▽ More
To ensure accountability and mitigate harm, it is critical that diverse stakeholders can interrogate black-box automated systems and find information that is understandable, relevant, and useful to them. In this paper, we eschew prior expertise- and role-based categorizations of interpretability stakeholders in favor of a more granular framework that decouples stakeholders' knowledge from their interpretability needs. We characterize stakeholders by their formal, instrumental, and personal knowledge and how it manifests in the contexts of machine learning, the data domain, and the general milieu. We additionally distill a hierarchical typology of stakeholder needs that distinguishes higher-level domain goals from lower-level interpretability tasks. In assessing the descriptive, evaluative, and generative powers of our framework, we find our more nuanced treatment of stakeholders reveals gaps and opportunities in the interpretability literature, adds precision to the design and comparison of user studies, and facilitates a more reflexive approach to conducting this research.
△ Less
Submitted 24 January, 2021;
originally announced January 2021.
-
Collaborative Storytelling with Large-scale Neural Language Models
Authors:
Eric Nichols,
Leo Gao,
Randy Gomez
Abstract:
Storytelling plays a central role in human socializing and entertainment. However, much of the research on automatic storytelling generation assumes that stories will be generated by an agent without any human interaction. In this paper, we introduce the task of collaborative storytelling, where an artificial intelligence agent and a person collaborate to create a unique story by taking turns addi…
▽ More
Storytelling plays a central role in human socializing and entertainment. However, much of the research on automatic storytelling generation assumes that stories will be generated by an agent without any human interaction. In this paper, we introduce the task of collaborative storytelling, where an artificial intelligence agent and a person collaborate to create a unique story by taking turns adding to it. We present a collaborative storytelling system which works with a human storyteller to create a story by generating new utterances based on the story so far. We constructed the storytelling system by tuning a publicly-available large scale language model on a dataset of writing prompts and their accompanying fictional works. We identify generating sufficiently human-like utterances to be an important technical issue and propose a sample-and-rank approach to improve utterance quality. Quantitative evaluation shows that our approach outperforms a baseline, and we present qualitative evaluation of our system's capabilities.
△ Less
Submitted 19 November, 2020;
originally announced November 2020.
-
Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation
Authors:
Raul Gomez,
Yahui Liu,
Marco De Nadai,
Dimosthenis Karatzas,
Bruno Lepri,
Nicu Sebe
Abstract:
Image to image translation aims to learn a map** that transforms an image from one visual domain to another. Recent works assume that images descriptors can be disentangled into a domain-invariant content representation and a domain-specific style representation. Thus, translation models seek to preserve the content of source images while changing the style to a target visual domain. However, sy…
▽ More
Image to image translation aims to learn a map** that transforms an image from one visual domain to another. Recent works assume that images descriptors can be disentangled into a domain-invariant content representation and a domain-specific style representation. Thus, translation models seek to preserve the content of source images while changing the style to a target visual domain. However, synthesizing new images is extremely challenging especially in multi-domain translations, as the network has to compose content and style to generate reliable and diverse images in multiple domains. In this paper we propose the use of an image retrieval system to assist the image-to-image translation task. First, we train an image-to-image translation model to map images to multiple domains. Then, we train an image retrieval model using real and generated images to find images similar to a query one in content but in a different domain. Finally, we exploit the image retrieval system to fine-tune the image-to-image translation model and generate higher quality images. Our experiments show the effectiveness of the proposed solution and highlight the contribution of the retrieval network, which can benefit from additional unlabeled data and help image-to-image translation models in the presence of scarce data.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Location Sensitive Image Retrieval and Tagging
Authors:
Raul Gomez,
Jaume Gibert,
Lluis Gomez,
Dimosthenis Karatzas
Abstract:
People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that l…
▽ More
People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that learns to rank triplets of images, tags and coordinates by plausibility, and two training strategies to balance the location influence in the final ranking. LocSens learns to fuse textual and location information of multimodal queries to retrieve related images at different levels of location granularity, and successfully utilizes location information to improve image tagging.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
Quantification of MagLIF morphology using the Mallat Scattering Transformation
Authors:
Michael E. Glinsky,
Thomas W. Moore,
William E. Lewis,
Matthew R. Weis,
Christopher A. Jennings,
David J. Ampleford,
Patrick F. Knapp,
Eric C. Harding,
Matthew R. Gomez,
Adam J. Harvey-Thompson
Abstract:
The morphology of the stagnated plasma resulting from Magnetized Liner Inertial Fusion (MagLIF) is measured by imaging the self-emission x-rays coming from the multi-keV plasma. Equivalent diagnostic response can be generated by integrated radiation-magnetohydrodynamic (rad-MHD) simulations from programs such as HYDRA and GORGON. There have been only limited quantitative ways to compare the image…
▽ More
The morphology of the stagnated plasma resulting from Magnetized Liner Inertial Fusion (MagLIF) is measured by imaging the self-emission x-rays coming from the multi-keV plasma. Equivalent diagnostic response can be generated by integrated radiation-magnetohydrodynamic (rad-MHD) simulations from programs such as HYDRA and GORGON. There have been only limited quantitative ways to compare the image morphology, that is the texture, of simulations and experiments. We have developed a metric of image morphology based on the Mallat Scattering Transformation (MST), a transformation that has proved to be effective at distinguishing textures, sounds, and written characters. This metric is designed, demonstrated, and refined by classifying ensembles (i.e., classes) of synthetic stagnation images, and by regressing an ensemble of synthetic stagnation images to the morphology (i.e., model) parameters used to generate the synthetic images. We use this metric to quantitatively compare simulations to experimental images, experimental images to each other, and to estimate the morphological parameters of the experimental images with uncertainty. This coordinate space has proved very adept at doing a sophisticated relative background subtraction in the MST space. This was needed to compare the experimental self emission images to the rad-MHD simulation images.
△ Less
Submitted 15 October, 2020; v1 submitted 13 April, 2020;
originally announced May 2020.
-
Exploring Hate Speech Detection in Multimodal Publications
Authors:
Raul Gomez,
Jaume Gibert,
Lluis Gomez,
Dimosthenis Karatzas
Abstract:
In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the c…
▽ More
In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the challenges of the proposed task. We find that, even though images are useful for the hate speech detection task, current multimodal models cannot outperform models analyzing only text. We discuss why and open the field and the dataset for further research.
△ Less
Submitted 9 October, 2019;
originally announced October 2019.
-
Towards a Theory of Intentions for Human-Robot Collaboration
Authors:
Rocio Gomez,
Mohan Sridharan,
Heather Riley
Abstract:
The architecture described in this paper encodes a theory of intentions based on the the key principles of non-procrastination, persistence, and automatically limiting reasoning to relevant knowledge and observations. The architecture reasons with transition diagrams of any given domain at two different resolutions, with the fine-resolution description defined as a refinement of, and hence tightly…
▽ More
The architecture described in this paper encodes a theory of intentions based on the the key principles of non-procrastination, persistence, and automatically limiting reasoning to relevant knowledge and observations. The architecture reasons with transition diagrams of any given domain at two different resolutions, with the fine-resolution description defined as a refinement of, and hence tightly-coupled to, a coarse-resolution description. Non-monotonic logical reasoning with the coarse-resolution description computes an activity (i.e., plan) comprising abstract actions for any given goal. Each abstract action is implemented as a sequence of concrete actions by automatically zooming to and reasoning with the part of the fine-resolution transition diagram relevant to the current coarse-resolution transition and the goal. Each concrete action in this sequence is executed using probabilistic models of the uncertainty in sensing and actuation, and the corresponding fine-resolution outcomes are used to infer coarse-resolution observations that are added to the coarse-resolution history. The architecture's capabilities are evaluated in the context of a simulated robot assisting humans in an office domain, on a physical robot (Baxter) manipulating tabletop objects, and on a wheeled robot (Turtlebot) moving objects to particular places or people. The experimental results indicate improvements in reliability and computational efficiency compared with an architecture that does not include the theory of intentions, and an architecture that does not include zooming for fine-resolution reasoning.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
Selective Style Transfer for Text
Authors:
Raul Gomez,
Ali Furkan Biten,
Lluis Gomez,
Jaume Gibert,
Marçal Rusiñol,
Dimosthenis Karatzas
Abstract:
This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions. Results on different text domains (scene text, machine printed text and handwritten text) and cross modal results demonstrate that this is feasible, and open different research lines. Furthermore, two architectures for selective style transfer, which means transferring style to on…
▽ More
This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions. Results on different text domains (scene text, machine printed text and handwritten text) and cross modal results demonstrate that this is feasible, and open different research lines. Furthermore, two architectures for selective style transfer, which means transferring style to only desired image pixels, are proposed. Finally, scene text selective style transfer is evaluated as a data augmentation technique to expand scene text detection datasets, resulting in a boost of text detectors performance. Our implementation of the described models is publicly available.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Software System Design based on Patterns for Newton-Type Methods
Authors:
Ricardo Serrato Barrera,
Gustavo Rodríguez Gómez,
Julio César Pérez Sansalvador,
Saul E. Pomares Hernández,
Leticia Flores Pulido,
Antonio Muñoz
Abstract:
A wide range of engineering applications uses optimisation techniques as part of their solution process. The researcher uses specialized software that implements well-known optimisation techniques to solve his problem. However, when it comes to develop original optimisation techniques that fit a particular problem the researcher has no option but to implement his own new method from scratch. This…
▽ More
A wide range of engineering applications uses optimisation techniques as part of their solution process. The researcher uses specialized software that implements well-known optimisation techniques to solve his problem. However, when it comes to develop original optimisation techniques that fit a particular problem the researcher has no option but to implement his own new method from scratch. This leads to large development times and error prone code that, in general, will not be reused for any other application. In this work, we present a novel methodology that simplifies, fasten and improves the development process of scientific software. This methodology guide us on the identification of design patterns. The application of this methodology generates reusable, flexible and high quality scientific software. Furthermore, the produced software becomes a documented tool to transfer the knowledge on the development process of scientific software. We apply this methodology for the design of an optimisation framework implementing Newton's type methods which can be used as a fast prototy** tool of new optimisation techniques based on Newton's type methods. The abstraction, reusability and flexibility of the developed framework is measured by means of Martin's metric. The results indicate that the developed software is highly reusable.
△ Less
Submitted 12 May, 2019;
originally announced May 2019.
-
Improving Interactive Reinforcement Agent Planning with Human Demonstration
Authors:
Guangliang Li,
Randy Gomez,
Keisuke Nakamura,
**ying Lin,
Qilei Zhang,
Bo He
Abstract:
TAMER has proven to be a powerful interactive reinforcement learning method for allowing ordinary people to teach and personalize autonomous agents' behavior by providing evaluative feedback. However, a TAMER agent planning with UCT---a Monte Carlo Tree Search strategy, can only update states along its path and might induce high learning cost especially for a physical robot. In this paper, we prop…
▽ More
TAMER has proven to be a powerful interactive reinforcement learning method for allowing ordinary people to teach and personalize autonomous agents' behavior by providing evaluative feedback. However, a TAMER agent planning with UCT---a Monte Carlo Tree Search strategy, can only update states along its path and might induce high learning cost especially for a physical robot. In this paper, we propose to drive the agent's exploration along the optimal path and reduce the learning cost by initializing the agent's reward function via inverse reinforcement learning from demonstration. We test our proposed method in the RL benchmark domain---Grid World---with different discounts on human reward. Our results show that learning from demonstration can allow a TAMER agent to learn a roughly optimal policy up to the deepest search and encourage the agent to explore along the optimal path. In addition, we find that learning from demonstration can improve the learning efficiency by reducing total feedback, the number of incorrect actions and increasing the ratio of correct actions to obtain an optimal policy, allowing a TAMER agent to converge faster.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
Self-Supervised Learning from Web Data for Multimodal Retrieval
Authors:
Raul Gomez,
Lluis Gomez,
Jaume Gibert,
Dimosthenis Karatzas
Abstract:
Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal data. In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge lea…
▽ More
Self-Supervised learning from multimodal image and text data allows deep neural networks to learn powerful features with no need of human annotated data. Web and Social Media platforms provide a virtually unlimited amount of this multimodal data. In this work we propose to exploit this free available data to learn a multimodal image and text embedding, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the proposed pipeline can learn from images with associated textwithout supervision and analyze the semantic structure of the learnt joint image and text embedding space. We perform a thorough analysis and performance comparison of five different state of the art text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further, we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.
△ Less
Submitted 7 January, 2019;
originally announced January 2019.
-
Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods
Authors:
Raul Gomez,
Lluis Gomez,
Jaume Gibert,
Dimosthenis Karatzas
Abstract:
Massive tourism is becoming a big problem for some cities, such as Barcelona, due to its concentration in some neighborhoods. In this work we gather Instagram data related to Barcelona consisting on images-captions pairs and, using the text as a supervisory signal, we learn relations between images, words and neighborhoods. Our goal is to learn which visual elements appear in photos when people is…
▽ More
Massive tourism is becoming a big problem for some cities, such as Barcelona, due to its concentration in some neighborhoods. In this work we gather Instagram data related to Barcelona consisting on images-captions pairs and, using the text as a supervisory signal, we learn relations between images, words and neighborhoods. Our goal is to learn which visual elements appear in photos when people is posting about each neighborhood. We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level. The presented pipeline allows analyzing the differences between the images that tourists and locals associate to the different neighborhoods. The proposed method, which can be extended to other cities or subjects, proves that Instagram data can be used to train multi-modal (image and text) machine learning models that are useful to analyze publications about a city at a neighborhood level. We publish the collected dataset, InstaBarcelona and the code used in the analysis.
△ Less
Submitted 20 August, 2018;
originally announced August 2018.
-
Learning to Learn from Web Data through Deep Semantic Embeddings
Authors:
Raul Gomez,
Lluis Gomez,
Jaume Gibert,
Dimosthenis Karatzas
Abstract:
In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings i…
▽ More
In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.
△ Less
Submitted 20 August, 2018;
originally announced August 2018.
-
TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces
Authors:
Yash Patel,
Lluis Gomez,
Raul Gomez,
Marçal Rusiñol,
Dimosthenis Karatzas,
C. V. Jawahar
Abstract:
The immense success of deep learning based methods in computer vision heavily relies on large scale training datasets. These richly annotated datasets help the network learn discriminative visual features. Collecting and annotating such datasets requires a tremendous amount of human effort and annotations are limited to popular set of classes. As an alternative, learning visual features by designi…
▽ More
The immense success of deep learning based methods in computer vision heavily relies on large scale training datasets. These richly annotated datasets help the network learn discriminative visual features. Collecting and annotating such datasets requires a tremendous amount of human effort and annotations are limited to popular set of classes. As an alternative, learning visual features by designing auxiliary tasks which make use of freely available self-supervision has become increasingly popular in the computer vision community.
In this paper, we put forward an idea to take advantage of multi-modal context to provide self-supervision for the training of computer vision algorithms. We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration. More specifically we use popular text embedding techniques to provide the self-supervision for the training of deep CNN.
Our experiments demonstrate state-of-the-art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or naturally-supervised approaches.
△ Less
Submitted 4 July, 2018;
originally announced July 2018.
-
Transversals of Longest Paths
Authors:
Márcia R. Cerioli,
Cristina G. Fernandes,
Renzo Gómez,
Juan Gutiérrez,
Paloma T. Lima
Abstract:
Let $\lpt(G)$ be the minimum cardinality of a set of vertices that intersects all longest paths in a graph $G$. Let $ω(G)$ be the size of a maximum clique in $G$, and $\tw(G)$ be the treewidth of $G$. We prove that $ \lpt(G) \leq \max\{1,ω(G)-2\}$ when $G$ is a connected chordal graph; that $\lpt(G) =1$ when $G$ is a connected bipartite permutation graph or a connected full substar graph; and that…
▽ More
Let $\lpt(G)$ be the minimum cardinality of a set of vertices that intersects all longest paths in a graph $G$. Let $ω(G)$ be the size of a maximum clique in $G$, and $\tw(G)$ be the treewidth of $G$. We prove that $ \lpt(G) \leq \max\{1,ω(G)-2\}$ when $G$ is a connected chordal graph; that $\lpt(G) =1$ when $G$ is a connected bipartite permutation graph or a connected full substar graph; and that $\lpt(G) \leq \tw(G)$ for any connected graph $G$.
△ Less
Submitted 19 December, 2017;
originally announced December 2017.
-
Improving Text Proposals for Scene Images with Fully Convolutional Networks
Authors:
Dena Bazazian,
Raul Gomez,
Anguelos Nicolaou,
Lluis Gomez,
Dimosthenis Karatzas,
Andrew D. Bagdanov
Abstract:
Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm of Gom…
▽ More
Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm of Gomez and Karatzas (2016), combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art.
△ Less
Submitted 16 February, 2017;
originally announced February 2017.
-
QIS-XML: An Extensible Markup Language for Quantum Information Science
Authors:
Pascal Heus,
Richard Gomez
Abstract:
This Master thesis examines issues of interoperability and integration between the Classic Information Science (CIS) and Quantum Information Science (QIS). It provides a short introduction to the Extensible Markup Language (XML) and proceeds to describe the development steps that have lead to a prototype XML specification for quantum computing (QIS-XML). QIS-XML is a proposed framework, based on t…
▽ More
This Master thesis examines issues of interoperability and integration between the Classic Information Science (CIS) and Quantum Information Science (QIS). It provides a short introduction to the Extensible Markup Language (XML) and proceeds to describe the development steps that have lead to a prototype XML specification for quantum computing (QIS-XML). QIS-XML is a proposed framework, based on the widely used standard (XML) to describe, visualize, exchange and process quantum gates and quantum circuits. It also provides a potential approach to a generic programming language for quantum computers through the concept of XML driven compilers. Examples are provided for the description of commonly used quantum gates and circuits, accompanied with tools to visualize them in standard web browsers. An algorithmic example is also presented, performing a simple addition operation with quantum circuits and running the program on a quantum computer simulator. Overall, this initial effort demonstrates how XML technologies could be at the core of the architecture for describing and programming quantum computers. By leveraging a widely accepted standard, QIS-XML also builds a bridge between classic and quantum IT, which could foster the acceptance of QIS by the ICT community and facilitate the understanding of quantum technology by IT experts. This would support the consolidation of Classic Information Science and Quantum Information Science into a Complete Information Science, a challenge that could be referred to as the "Information Science Grand Unification Challenge".
△ Less
Submitted 14 June, 2011;
originally announced June 2011.
-
QIS-XML: A metadata specification for Quantum Information Science
Authors:
Pascal Heus,
Richard Gomez
Abstract:
While Quantum Information Science (QIS) is still in its infancy, the ability for quantum based hardware or computers to communicate and integrate with their classical counterparts will be a major requirement towards their success. Little attention however has been paid to this aspect of QIS. To manage and exchange information between systems, today's classic Information Technology (IT) commonly…
▽ More
While Quantum Information Science (QIS) is still in its infancy, the ability for quantum based hardware or computers to communicate and integrate with their classical counterparts will be a major requirement towards their success. Little attention however has been paid to this aspect of QIS. To manage and exchange information between systems, today's classic Information Technology (IT) commonly uses the eXtensible Markup Language (XML) and its related tools. XML is composed of numerous specifications related to various fields of expertise. No such global specification however has been defined for quantum computers. QIS-XML is a proposed XML metadata specification for the description of fundamental components of QIS (gates & circuits) and a platform for the development of a hardware independent low level pseudo-code for quantum algorithms. This paper lays out the general characteristics of the QIS-XML specification and outlines practical applications through prototype use cases.
△ Less
Submitted 23 December, 2007;
originally announced December 2007.