-
Extending Environments To Measure Self-Reflection In Reinforcement Learning
Authors:
Samuel Allen Alexander,
Michael Castaneda,
Kevin Compher,
Oscar Martinez
Abstract:
We consider an extended notion of reinforcement learning in which the environment can simulate the agent and base its outputs on the agent's hypothetical behavior. Since good performance usually requires paying attention to whatever things the environment's outputs are based on, we argue that for an agent to achieve on-average good performance across many such extended environments, it is necessar…
▽ More
We consider an extended notion of reinforcement learning in which the environment can simulate the agent and base its outputs on the agent's hypothetical behavior. Since good performance usually requires paying attention to whatever things the environment's outputs are based on, we argue that for an agent to achieve on-average good performance across many such extended environments, it is necessary for the agent to self-reflect. Thus weighted-average performance over the space of all suitably well-behaved extended environments could be considered a way of measuring how self-reflective an agent is. We give examples of extended environments and introduce a simple transformation which experimentally seems to increase some standard RL agents' performance in a certain type of extended environment.
△ Less
Submitted 19 July, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
DocParser: Hierarchical Structure Parsing of Document Renderings
Authors:
Johannes Rausch,
Octavio Martinez,
Fabian Bissig,
Ce Zhang,
Stefan Feuerriegel
Abstract:
Translating renderings (e. g. PDFs, scans) into hierarchical document structures is extensively demanded in the daily routines of many real-world applications. However, a holistic, principled approach to inferring the complete hierarchical structure of documents is missing. As a remedy, we developed "DocParser": an end-to-end system for parsing the complete document structure - including all text…
▽ More
Translating renderings (e. g. PDFs, scans) into hierarchical document structures is extensively demanded in the daily routines of many real-world applications. However, a holistic, principled approach to inferring the complete hierarchical structure of documents is missing. As a remedy, we developed "DocParser": an end-to-end system for parsing the complete document structure - including all text elements, nested figures, tables, and table cell structures. Our second contribution is to provide a dataset for evaluating hierarchical document structure parsing. Our third contribution is to propose a scalable learning framework for settings where domain-specific data are scarce, which we address by a novel approach to weak supervision that significantly improves the document structure parsing performance. Our experiments confirm the effectiveness of our proposed weak supervision: Compared to the baseline without weak supervision, it improves the mean average precision for detecting document entities by 39.1 % and improves the F1 score of classifying hierarchical relations by 35.8 %.
△ Less
Submitted 25 January, 2021; v1 submitted 5 November, 2019;
originally announced November 2019.
-
Effects of Different Hand-Grounding Locations on Haptic Performance With a Wearable Kinesthetic Haptic Device
Authors:
Sajid Nisar,
Melisa Orta Martinez,
Takahiro Endo,
Fumitoshi Matsuno,
Allison M. Okamura
Abstract:
Grounding of kinesthetic feedback against a user's hand can increase the portability and wearability of a haptic device. However, the effects of different hand-grounding locations on haptic perception of a user are unknown. In this letter, we investigate the effects of three different hand-grounding locations-back of the hand, proximal phalanx of the index finger, and middle phalanx of the index f…
▽ More
Grounding of kinesthetic feedback against a user's hand can increase the portability and wearability of a haptic device. However, the effects of different hand-grounding locations on haptic perception of a user are unknown. In this letter, we investigate the effects of three different hand-grounding locations-back of the hand, proximal phalanx of the index finger, and middle phalanx of the index finger-on haptic perception using a newly designed wearable haptic device. The novel device can provide kinesthetic feedback to the user's index finger in two directions: along the finger-axis and in the finger's flexion-extension movement direction. We measure users' haptic perception for each grounding location through a psychophysical experiment for each of the two feedback directions. Results show that among the studied locations, grounding at proximal phalanx has a smaller average just noticeable difference for both feedback directions, indicating a more sensitive haptic perception. The realism of the haptic feedback, based on user ratings, was the highest with grounding at the middle phalanx for feedback along the finger axis, and at the proximal phalanx for feedback in the flexion-extension direction. Users identified the haptic feedback as most comfortable with grounding at the back of the hand for feedback along the finger axis and at the proximal phalanx for feedback in the flexion-extension direction. These findings show that the choice of grounding location has a significant impact on the user's haptic perception and qualitative experience. The results provide insights for designing next-generation wearable hand-grounded kinesthetic devices to achieve better haptic performance and user experience in virtual reality and teleoperated robotic applications.
△ Less
Submitted 2 June, 2019;
originally announced June 2019.
-
Learning Disentangled Representations with Reference-Based Variational Autoencoders
Authors:
Adria Ruiz,
Oriol Martinez,
Xavier Binefa,
Jakob Verbeek
Abstract:
Learning disentangled representations from visual data, where different high-level generative factors are independently encoded, is of importance for many computer vision tasks. Solving this problem, however, typically requires to explicitly label all the factors of interest in training images. To alleviate the annotation cost, we introduce a learning setting which we refer to as "reference-based…
▽ More
Learning disentangled representations from visual data, where different high-level generative factors are independently encoded, is of importance for many computer vision tasks. Solving this problem, however, typically requires to explicitly label all the factors of interest in training images. To alleviate the annotation cost, we introduce a learning setting which we refer to as "reference-based disentangling". Given a pool of unlabeled images, the goal is to learn a representation where a set of target factors are disentangled from others. The only supervision comes from an auxiliary "reference set" containing images where the factors of interest are constant. In order to address this problem, we propose reference-based variational autoencoders, a novel deep generative model designed to exploit the weak-supervision provided by the reference set. By addressing tasks such as feature learning, conditional image generation or attribute transfer, we validate the ability of the proposed model to learn disentangled representations from this minimal form of supervision.
△ Less
Submitted 24 January, 2019;
originally announced January 2019.
-
Superresolution method for data deconvolution by superposition of point sources
Authors:
Sandra Martínez,
Oscar E. Martínez
Abstract:
In this work we present a new algorithm for data deconvolution that allows the retrieval of the target function with super-resolution with a simple approach that after a precis e measurement of the instrument response function (IRF), the measured data are fit by a superposition of point sources (SUPPOSe) of equal intensity. In this manner only the positions of the sources need to be determined by…
▽ More
In this work we present a new algorithm for data deconvolution that allows the retrieval of the target function with super-resolution with a simple approach that after a precis e measurement of the instrument response function (IRF), the measured data are fit by a superposition of point sources (SUPPOSe) of equal intensity. In this manner only the positions of the sources need to be determined by an algorithm that minimizes the norm of the difference between the measured data and the convolution of the superposed point sources with the IRF. An upper bound for the uncertainty in the position of the sources was derived and two very different experimental situations were used for the test (an optical spectrum and fluorescent microscopy images) showing excellent reconstructions and agreement with the predicted uncertainties, achieving λ/10 resolution for the microscope and a fivefold improvement in the spectral resolution for the spectrometer. The method also provides a way to determine the optimum number of sources to be used for the fit.
△ Less
Submitted 5 December, 2018; v1 submitted 8 May, 2018;
originally announced May 2018.
-
Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database
Authors:
Adriana Fernandez-Lopez,
Oriol Martinez,
Federico M. Sukno
Abstract:
Speech is the most used communication method between humans and it involves the perception of auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, although the video can provide information that is complementary to the audio. Exploiting the visual information, however, has proven challenging. On one hand, researchers have reported that the map** b…
▽ More
Speech is the most used communication method between humans and it involves the perception of auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, although the video can provide information that is complementary to the audio. Exploiting the visual information, however, has proven challenging. On one hand, researchers have reported that the map** between phonemes and visemes (visual units) is one-to-many because there are phonemes which are visually similar and indistinguishable between them. On the other hand, it is known that some people are very good lip-readers (e.g: deaf people). We study the limit of visual only speech recognition in controlled conditions. With this goal, we designed a new database in which the speakers are aware of being read and aim to facilitate lip-reading. In the literature, there are discrepancies on whether hearing-impaired people are better lip-readers than normal-hearing people. Then, we analyze if there are differences between the lip-reading abilities of 9 hearing-impaired and 15 normal-hearing people. Finally, human abilities are compared with the performance of a visual automatic speech recognition system. In our tests, hearing-impaired participants outperformed the normal-hearing participants but without reaching statistical significance. Human observers were able to decode 44% of the spoken message. In contrast, the visual only automatic system achieved 20% of word recognition rate. However, if we repeat the comparison in terms of phonemes both obtained very similar recognition rates, just above 50%. This suggests that the gap between human lip-reading and automatic speech-reading might be more related to the use of context than to the ability to interpret mouth appearance.
△ Less
Submitted 26 April, 2017;
originally announced April 2017.
-
Geometry-Based Stochastic Channel Models for 5G: Extending Key Features for Massive MIMO
Authors:
Alex Oliveras Martinez,
Patrick Eggers,
Elisabeth De Carvalho
Abstract:
This paper introduces three key features in geometry-based stochastic channel models in order to include massive MIMO channels. Those key features consists of multi-user (MU) consistency, non-stationarities across the base station array and inclusion of spherical wave modelling. To ensure MU consistency, we introduce the concept of "user aura", which is a circle around the user with radius defined…
▽ More
This paper introduces three key features in geometry-based stochastic channel models in order to include massive MIMO channels. Those key features consists of multi-user (MU) consistency, non-stationarities across the base station array and inclusion of spherical wave modelling. To ensure MU consistency, we introduce the concept of "user aura", which is a circle around the user with radius defined according to the stationarity interval. The overlap between auras determines the share of common clusters among users. To model non-stationarities across a massive array, sub-arrays are defined for which clusters are independently generated. At last, we describe a procedure to incorporate spherical wave modelling, where a cluster focal point is defined to account for distance between user and cluster.
△ Less
Submitted 19 September, 2016;
originally announced September 2016.
-
Design and Performance Analysis of Non-Coherent Detection Systems with Massive Receiver Arrays
Authors:
Lishuai **g,
Elisabeth De Carvalho,
Petar Popovski,
Alex Oliveras Martinez
Abstract:
Harvesting the gain of a large number of antennas in a mmWave band has mainly been relying on the costly operation of channel state information (CSI) acquisition and cumbersome phase shifters. Recent works have started to investigate the possibility to use receivers based on energy detection (ED), where a single data stream is decoded based on the channel and noise energy. The asymptotic features…
▽ More
Harvesting the gain of a large number of antennas in a mmWave band has mainly been relying on the costly operation of channel state information (CSI) acquisition and cumbersome phase shifters. Recent works have started to investigate the possibility to use receivers based on energy detection (ED), where a single data stream is decoded based on the channel and noise energy. The asymptotic features of the massive receiver array lead to a system where the impact of the noise becomes predictable due to a noise hardening effect. This in effect extends the communication range compared to the receiver with a small number of antennas, as the latter is limited by the unpredictability of the additive noise. When the channel has a large number of spatial degrees of freedom, the system becomes robust to imperfect channel knowledge due to channel hardening. We propose two detection methods based on the instantaneous and average channel energy, respectively. Meanwhile, we design the detection thresholds based on the asymptotic properties of the received energy. Differently from existing works, we analyze the scaling law behavior of the symbol-error-rate (SER). When the instantaneous channel energy is known, the performance of ED approaches that of the coherent detection in high SNR scenarios. When the receiver relies on the average channel energy, our performance analysis is based on the exact SER, rather than an approximation. It is shown that the logarithm of SER decreases linearly as a function of the number of antennas. Additionally, a saturation appears at high SNR for PAM constellations of order larger than two, due to the uncertainty on the channel energy. Simulation results show that ED, with a much lower complexity, achieves promising performance both in Rayleigh fading channels and in sparse channels.
△ Less
Submitted 20 June, 2016;
originally announced June 2016.
-
In a World That Counts: Clustering and Detecting Fake Social Engagement at Scale
Authors:
Yixuan Li,
Oscar Martinez,
Xing Chen,
Yi Li,
John Hopcroft
Abstract:
How can web services that depend on user generated content discern fake social engagement activities by spammers from legitimate ones? In this paper, we focus on the social site of YouTube and the problem of identifying bad actors posting inorganic contents and inflating the count of social engagement metrics. We propose an effective method, Leas (Local Expansion at Scale), and show how the fake e…
▽ More
How can web services that depend on user generated content discern fake social engagement activities by spammers from legitimate ones? In this paper, we focus on the social site of YouTube and the problem of identifying bad actors posting inorganic contents and inflating the count of social engagement metrics. We propose an effective method, Leas (Local Expansion at Scale), and show how the fake engagement activities on YouTube can be tracked over time by analyzing the temporal graph based on the engagement behavior pattern between users and YouTube videos. With the domain knowledge of spammer seeds, we formulate and tackle the problem in a semi-supervised manner --- with the objective of searching for individuals that have similar pattern of behavior as the known seeds --- based on a graph diffusion process via local spectral subspace. We offer a fast, scalable MapReduce deployment adapted from the localized spectral clustering algorithm. We demonstrate the effectiveness of our deployment at Google by achieving an manual review accuracy of 98% on YouTube Comments graph in practice. Comparing with the state-of-the-art algorithm CopyCatch, Leas achieves 10 times faster running time. Leas is actively in use at Google, searching for daily deceptive practices on YouTube's engagement graph spanning over a billion users.
△ Less
Submitted 20 January, 2016; v1 submitted 16 December, 2015;
originally announced December 2015.
-
Towards Very Large Aperture Massive MIMO: a measurement based study
Authors:
Àlex Oliveras Martínez,
Elisabeth De Carvalho,
Jesper Ødum Nielsen
Abstract:
Massive MIMO is a new technique for wireless communications that claims to offer very high system throughput and energy efficiency in multi-user scenarios. The cost is to add a very large number of antennas at the base station. Theoretical research has probed these benefits, but very few measurements have showed the potential of Massive MIMO in practice. We investigate the properties of measured M…
▽ More
Massive MIMO is a new technique for wireless communications that claims to offer very high system throughput and energy efficiency in multi-user scenarios. The cost is to add a very large number of antennas at the base station. Theoretical research has probed these benefits, but very few measurements have showed the potential of Massive MIMO in practice. We investigate the properties of measured Massive MIMO channels in a large indoor venue. We describe a measurement campaign using 3 arrays having different shape and aperture, with 64 antennas and 8 users with 2 antennas each. We focus on the impact of the array aperture which is the main limiting factor in the degrees of freedom available in the multiple antenna channel. We find that performance is improved as the aperture increases, with an impact mostly visible in crowded scenarios where the users are closely spaced. We also test MIMO capability within a same user device with user proximity effect. We see a good channel resolvability with confirmation of the strong effect of the user hand grip. At last, we highlight that propagation conditions where line-of-sight is dominant can be favorable.
△ Less
Submitted 22 July, 2015;
originally announced July 2015.
-
An Efficient Algorithm to Calculate the Center of the Biggest Inscribed Circle in an Irregular Polygon
Authors:
Oscar Martinez
Abstract:
In this paper, an efficient algorithm to find the center of the biggest circle inscribed in a given polygon is described. This work was inspired by the publication of Daniel Garcia-Castellanos & Umberto Lombardo and their algorithm used to find a landmass' poles of inaccessibility. Two more efficient algorithms were found, one of them only applicable when the problem can be described as a linear p…
▽ More
In this paper, an efficient algorithm to find the center of the biggest circle inscribed in a given polygon is described. This work was inspired by the publication of Daniel Garcia-Castellanos & Umberto Lombardo and their algorithm used to find a landmass' poles of inaccessibility. Two more efficient algorithms were found, one of them only applicable when the problem can be described as a linear problem, like in the case of a convex polygon.
Keywords: distance geometry, euclidean distance, inscribed circle, irregular polygon, algorithm, mathematical optimization, Monte Carlo, linear programming, maximin
△ Less
Submitted 13 December, 2012;
originally announced December 2012.