-
Reconstruction for Sparse View Tomography of Long Objects Applied to Imaging in the Wood Industry
Authors:
Buda Bajić,
Johannes A. J. Huber,
Benedikt Neyses,
Linus Olofsson,
Ozan Öktem
Abstract:
In the wood industry, logs are commonly quality screened by discrete X-ray scans on a moving conveyor belt from a few source positions. Typically, two-dimensional (2D) slice-wise measurements are obtained by a sequential scanning geometry. Each 2D slice alone does not carry sufficient information for a three-dimensional tomographic reconstruction in which biological features of interest in the log…
▽ More
In the wood industry, logs are commonly quality screened by discrete X-ray scans on a moving conveyor belt from a few source positions. Typically, two-dimensional (2D) slice-wise measurements are obtained by a sequential scanning geometry. Each 2D slice alone does not carry sufficient information for a three-dimensional tomographic reconstruction in which biological features of interest in the log are well preserved. In the present work, we propose a learned iterative reconstruction method based on the Learned Primal-Dual neural network, suited for sequential scanning geometries. Our method accumulates information between neighbouring slices, instead of only accounting for single slices during reconstruction. Our quantitative and qualitative evaluations with as few as five source positions show that our method yields reconstructions of logs that are sufficiently accurate to identify biological features like knots (branches), heartwood and sapwood.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Towards practical reinforcement learning for tokamak magnetic control
Authors:
Brendan D. Tracey,
Andrea Michi,
Yuri Chervonyi,
Ian Davies,
Cosmin Paduraru,
Nevena Lazic,
Federico Felici,
Timo Ewalds,
Craig Donner,
Cristian Galperti,
Jonas Buchli,
Michael Neunert,
Andrea Huber,
Jonathan Evens,
Paula Kurylowicz,
Daniel J. Mankowitz,
Martin Riedmiller,
The TCV Team
Abstract:
Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the stea…
▽ More
Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the steady-state error, and decreasing the required time to learn new tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic improvements to the agent architecture and training procedure. We present simulation results that show up to 65\% improvement in shape accuracy, achieve substantial reduction in the long-term bias of the plasma current, and additionally reduce the training time required to learn new tasks by a factor of 3 or more. We present new experiments using the upgraded RL-based controllers on the TCV tokamak, which validate the simulation results achieved, and point the way towards routinely achieving accurate discharges using the RL approach.
△ Less
Submitted 5 October, 2023; v1 submitted 21 July, 2023;
originally announced July 2023.
-
Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation
Authors:
Marius-Constantin Dinu,
Markus Holzleitner,
Maximilian Beck,
Hoan Duc Nguyen,
Andrea Huber,
Hamid Eghbal-zadeh,
Bernhard A. Moser,
Sergei Pereverzyev,
Sepp Hochreiter,
Werner Zellinger
Abstract:
We study the problem of choosing algorithm hyper-parameters in unsupervised domain adaptation, i.e., with labeled data in a source domain and unlabeled data in a target domain, drawn from a different input distribution. We follow the strategy to compute several models using different hyper-parameters, and, to subsequently compute a linear aggregation of the models. While several heuristics exist t…
▽ More
We study the problem of choosing algorithm hyper-parameters in unsupervised domain adaptation, i.e., with labeled data in a source domain and unlabeled data in a target domain, drawn from a different input distribution. We follow the strategy to compute several models using different hyper-parameters, and, to subsequently compute a linear aggregation of the models. While several heuristics exist that follow this strategy, methods are still missing that rely on thorough theories for bounding the target error. In this turn, we propose a method that extends weighted least squares to vector-valued functions, e.g., deep neural networks. We show that the target error of the proposed algorithm is asymptotically not worse than twice the error of the unknown optimal aggregation. We also perform a large scale empirical comparative study on several datasets, including text, images, electroencephalogram, body sensor signals and signals from mobile phones. Our method outperforms deep embedded validation (DEV) and importance weighted validation (IWV) on all datasets, setting a new state-of-the-art performance for solving parameter choice issues in unsupervised domain adaptation with theoretical error guarantees. We further study several competitive heuristics, all outperforming IWV and DEV on at least five datasets. However, our method outperforms each heuristic on at least five of seven datasets.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
Authors:
Tuomas Haarnoja,
Ben Moran,
Guy Lever,
Sandy H. Huang,
Dhruva Tirumala,
Jan Humplik,
Markus Wulfmeier,
Saran Tunyasuvunakool,
Noah Y. Siegel,
Roland Hafner,
Michael Bloesch,
Kristian Hartikainen,
Arunkumar Byravan,
Leonard Hasenclever,
Yuval Tassa,
Fereshteh Sadeghi,
Nathan Batchelor,
Federico Casarini,
Stefano Saliceti,
Charles Game,
Neil Sreendra,
Kushal Patel,
Marlon Gwira,
Andrea Huber,
Nicole Hurley
, et al. (3 additional authors not shown)
Abstract:
We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust…
▽ More
We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and it transitions between them in a smooth, stable, and efficient manner. The agent's locomotion and tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. The agent also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. Our agent was trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer. Although the robots are inherently fragile, basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way -- well beyond what is intuitively expected from the robot. Indeed, in experiments, they walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives.
△ Less
Submitted 11 April, 2024; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Stochastic Segmentation with Conditional Categorical Diffusion Models
Authors:
Lukas Zbinden,
Lars Doorenbos,
Theodoros Pissas,
Adrian Thomas Huber,
Raphael Sznitman,
Pablo Márquez-Neila
Abstract:
Semantic segmentation has made significant progress in recent years thanks to deep neural networks, but the common objective of generating a single segmentation output that accurately matches the image's content may not be suitable for safety-critical domains such as medical diagnostics and autonomous driving. Instead, multiple possible correct segmentation maps may be required to reflect the true…
▽ More
Semantic segmentation has made significant progress in recent years thanks to deep neural networks, but the common objective of generating a single segmentation output that accurately matches the image's content may not be suitable for safety-critical domains such as medical diagnostics and autonomous driving. Instead, multiple possible correct segmentation maps may be required to reflect the true distribution of annotation maps. In this context, stochastic semantic segmentation methods must learn to predict conditional distributions of labels given the image, but this is challenging due to the typically multimodal distributions, high-dimensional output spaces, and limited annotation data. To address these challenges, we propose a conditional categorical diffusion model (CCDM) for semantic segmentation based on Denoising Diffusion Probabilistic Models. Our model is conditioned to the input image, enabling it to generate multiple segmentation label maps that account for the aleatoric uncertainty arising from divergent ground truth annotations. Our experimental results show that CCDM achieves state-of-the-art performance on LIDC, a stochastic semantic segmentation dataset, and outperforms established baselines on the classical segmentation dataset Cityscapes.
△ Less
Submitted 11 September, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Computing expected multiplicities for bag-TIDBs with bounded multiplicities
Authors:
Su Feng,
Boris Glavic,
Aaron Huber,
Oliver Kennedy,
Atri Rudra
Abstract:
In this work, we study the problem of computing a tuple's expected multiplicity over probabilistic databases with bag semantics (where each tuple is associated with a multiplicity) exactly and approximately. We consider bag-TIDBs where we have a bound $c$ on the maximum multiplicity of each tuple and tuples are independent probabilistic events (we refer to such databases as c-TIDBs. We are specifi…
▽ More
In this work, we study the problem of computing a tuple's expected multiplicity over probabilistic databases with bag semantics (where each tuple is associated with a multiplicity) exactly and approximately. We consider bag-TIDBs where we have a bound $c$ on the maximum multiplicity of each tuple and tuples are independent probabilistic events (we refer to such databases as c-TIDBs. We are specifically interested in the fine-grained complexity of computing expected multiplicities and how it compares to the complexity of deterministic query evaluation algorithms -- if these complexities are comparable, it opens the door to practical deployment of probabilistic databases. Unfortunately, our results imply that computing expected multiplicities for c-TIDBs based on the results produced by such query evaluation algorithms introduces super-linear overhead (under parameterized complexity hardness assumptions/conjectures). We proceed to study approximation of expected result tuple multiplicities for positive relational algebra queries ($RA^+$) over c-TIDBs and for a non-trivial subclass of block-independent databases (BIDBs). We develop a sampling algorithm that computes a 1$\pmε$ approximation of the expected multiplicity of an output tuple in time linear in the runtime of the corresponding deterministic query for any $RA^+$ query.
△ Less
Submitted 1 July, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors
Authors:
Steven Bohez,
Saran Tunyasuvunakool,
Philemon Brakel,
Fereshteh Sadeghi,
Leonard Hasenclever,
Yuval Tassa,
Emilio Parisotto,
Jan Humplik,
Tuomas Haarnoja,
Roland Hafner,
Markus Wulfmeier,
Michael Neunert,
Ben Moran,
Noah Siegel,
Andrea Huber,
Francesco Romano,
Nathan Batchelor,
Federico Casarini,
Josh Merel,
Raia Hadsell,
Nicolas Heess
Abstract:
We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our appro…
▽ More
We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our approach does not require extensive reward engineering to produce sensible and natural looking behavior at the time of reuse. This makes it easy to create well-regularized, task-oriented controllers that are suitable for deployment on real robots. We demonstrate how our skill module can be used for imitation, and train controllable walking and ball dribbling policies for both the ANYmal quadruped and OP3 humanoid. These policies are then deployed on hardware via zero-shot simulation-to-reality transfer. Accompanying videos are available at https://bit.ly/robot-npmp.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Permutation invariant matrix statistics and computational language tasks
Authors:
Manuel Accettulli Huber,
Adriana Correia,
Sanjaye Ramgoolam,
Mehrnoosh Sadrzadeh
Abstract:
The Linguistic Matrix Theory programme introduced by Kartsaklis, Ramgoolam and Sadrzadeh is an approach to the statistics of matrices that are generated in type-driven distributional semantics, based on permutation invariant polynomial functions which are regarded as the key observables encoding the significant statistics. In this paper we generalize the previous results on the approximate Gaussia…
▽ More
The Linguistic Matrix Theory programme introduced by Kartsaklis, Ramgoolam and Sadrzadeh is an approach to the statistics of matrices that are generated in type-driven distributional semantics, based on permutation invariant polynomial functions which are regarded as the key observables encoding the significant statistics. In this paper we generalize the previous results on the approximate Gaussianity of matrix distributions arising from compositional distributional semantics. We also introduce a geometry of observable vectors for words, defined by exploiting the graph-theoretic basis for the permutation invariants and the statistical characteristics of the ensemble of matrices associated with the words. We describe successful applications of this unified framework to a number of tasks in computational linguistics, associated with the distinctions between synonyms, antonyms, hypernyms and hyponyms.
△ Less
Submitted 26 September, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs
Authors:
Dan Rosenbaum,
Marta Garnelo,
Michal Zielinski,
Charlie Beattie,
Ellen Clancy,
Andrea Huber,
Pushmeet Kohli,
Andrew W. Senior,
John Jumper,
Carl Doersch,
S. M. Ali Eslami,
Olaf Ronneberger,
Jonas Adler
Abstract:
Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to…
▽ More
Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to play a key role in protein function. Recent methods for capturing conformational heterogeneity in cryo-EM data model it in volume space, making recovery of continuous atomic structures challenging. Here we present a fully deep-learning-based approach using variational auto-encoders (VAEs) to recover a continuous distribution of atomic protein structures and poses directly from picked particle images and demonstrate its efficacy on realistic simulated data. We hope that methods built on this work will allow incorporation of stronger prior information about protein structure and enable better understanding of non-rigid protein structures.
△ Less
Submitted 26 June, 2021;
originally announced June 2021.
-
Efficient and Accurate In-Database Machine Learning with SQL Code Generation in Python
Authors:
Michael Kaufmann,
Gabriel Stechschulte,
Anna Huber
Abstract:
Following an analysis of the advantages of SQL-based Machine Learning (ML) and a short literature survey of the field, we describe a novel method for In-Database Machine Learning (IDBML). We contribute a process for SQL-code generation in Python using template macros in **ja2 as well as the prototype implementation of the process. We describe our implementation of the process to compute multidime…
▽ More
Following an analysis of the advantages of SQL-based Machine Learning (ML) and a short literature survey of the field, we describe a novel method for In-Database Machine Learning (IDBML). We contribute a process for SQL-code generation in Python using template macros in **ja2 as well as the prototype implementation of the process. We describe our implementation of the process to compute multidimensional histogram (MDH) probability estimation in SQL. For this, we contribute and implement a novel discretization method called equal quantized rank binning (EQRB) and equal-width binning (EWB). Based on this, we provide data gathered in a benchmarking experiment for the quantitative empirical evaluation of our method and system using the Covertype dataset. We measured accuracy and computation time and compared it to Scikit Learn state of the art classification algorithms. Using EWB, our multidimensional probability estimation was the fastest of all tested algorithms, while being only 1-2% less accurate than the best state of the art methods found (decision trees and random forests). Our method was significantly more accurate than Naive Bayes, which assumes independent one-dimensional probabilities and/or densities. Also, our method was significantly more accurate and faster than logistic regression. This motivates for further research in accuracy improvement and in IDBML with SQL code generation for big data and larger-than-memory datasets.
△ Less
Submitted 31 May, 2021; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds (extended version)
Authors:
Su Feng,
Aaron Huber,
Boris Glavic,
Oliver Kennedy
Abstract:
Certain answers are a principled method for co** with the uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Prior work introduced Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-approximation of certain answers. UA-DBs combine the reliability of certain answers based o…
▽ More
Certain answers are a principled method for co** with the uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Prior work introduced Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-approximation of certain answers. UA-DBs combine the reliability of certain answers based on incomplete K-relations with the performance of classical deterministic database systems. However, UA-DBs only support a limited class of queries and do not support attribute-level uncertainty which can lead to inaccurate under-approximations of certain answers. In this paper, we introduce attribute-annotated uncertain databases (AU-DBs) which extend the UA-DB model with attribute-level annotations that record bounds on the values of an attribute across all possible worlds. This enables more precise approximations of incomplete databases. Furthermore, we extend UA-DBs to encode an compact over-approximation of possible answers which is necessary to support non-monotone queries including aggregation and set difference. We prove that query processing over AU-DBs preserves the bounds of certain and possible answers and investigate algorithms for compacting intermediate results to retain efficiency. Through an compact encoding of possible answers, our approach also provides a solid foundation for handling missing data. Using optimizations that trade accuracy for performance, our approach scales to complex queries and large datasets, and produces accurate results. Furthermore, it significantly outperforms alternative methods for uncertain data management.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
A Deep Learning Approach for Characterizing Major Galaxy Mergers
Authors:
Skanda Koppula,
Victor Bapst,
Marc Huertas-Company,
Sam Blackwell,
Agnieszka Grabska-Barwinska,
Sander Dieleman,
Andrea Huber,
Natasha Antropova,
Mikolaj Binkowski,
Hannah Openshaw,
Adria Recasens,
Fernando Caro,
Avishai Deke,
Yohan Dubois,
Jesus Vega Ferrero,
David C. Koo,
Joel R. Primack,
Trevor Back
Abstract:
Fine-grained estimation of galaxy merger stages from observations is a key problem useful for validation of our current theoretical understanding of galaxy formation. To this end, we demonstrate a CNN-based regression model that is able to predict, for the first time, using a single image, the merger stage relative to the first perigee passage with a median error of 38.3 million years (Myrs) over…
▽ More
Fine-grained estimation of galaxy merger stages from observations is a key problem useful for validation of our current theoretical understanding of galaxy formation. To this end, we demonstrate a CNN-based regression model that is able to predict, for the first time, using a single image, the merger stage relative to the first perigee passage with a median error of 38.3 million years (Myrs) over a period of 400 Myrs. This model uses no specific dynamical modeling and learns only from simulated merger events. We show that our model provides reasonable estimates on real observations, approximately matching prior estimates provided by detailed dynamical modeling. We provide a preliminary interpretability analysis of our models, and demonstrate first steps toward calibrated uncertainty estimation.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
Boom, Bust, and Bitcoin: Bitcoin-Bubbles As Innovation Accelerators
Authors:
Tobias A. Huber,
Didier Sornette
Abstract:
Bitcoin represents one of the most interesting technological breakthroughs and socio-economic experiments of the last decades. In this paper, we examine the role of speculative bubbles in the process of Bitcoin's technological adoption by analyzing its social dynamics. We trace Bitcoin's genesis and dissect the nature of its techno-economic innovation. In particular, we present an analysis of the…
▽ More
Bitcoin represents one of the most interesting technological breakthroughs and socio-economic experiments of the last decades. In this paper, we examine the role of speculative bubbles in the process of Bitcoin's technological adoption by analyzing its social dynamics. We trace Bitcoin's genesis and dissect the nature of its techno-economic innovation. In particular, we present an analysis of the techno-economic feedback loops that drive Bitcoin's price and network effects. Based on our analysis of Bitcoin, we test and further refine the Social Bubble Hypothesis, which holds that bubbles constitute an essential component in the process of technological innovation. We argue that a hierarchy of repeating and exponentially increasing series of bubbles and hype cycles, which has occurred over the past decade since its inception, has bootstrapped Bitcoin into existence.
△ Less
Submitted 13 May, 2020;
originally announced May 2020.
-
Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers (extended version)
Authors:
Su Feng,
Aaron Huber,
Boris Glavic,
Oliver Kennedy
Abstract:
Certain answers are a principled method for co** with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve the uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-app…
▽ More
Certain answers are a principled method for co** with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve the uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-approximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UA-DBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UA-DBs are based on incomplete K-relations, which we introduce to generalize the classical set-based notions of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximations of certain answers that are of high utility.
△ Less
Submitted 30 March, 2019;
originally announced April 2019.
-
Efficient Black-Box Reductions for Separable Cost Sharing
Authors:
Tobias Harks,
Martin Hoefer,
Anja Huber,
Manuel Surek
Abstract:
In cost sharing games with delays, a set of agents jointly allocates a finite subset of resources. Each resource has a fixed cost that has to be shared by the players, and each agent has a nonshareable player-specific delay for each resource. A prominent example is uncapacitated facility location (UFL), where facilities need to be opened (at a shareable cost) and clients want to connect to opened…
▽ More
In cost sharing games with delays, a set of agents jointly allocates a finite subset of resources. Each resource has a fixed cost that has to be shared by the players, and each agent has a nonshareable player-specific delay for each resource. A prominent example is uncapacitated facility location (UFL), where facilities need to be opened (at a shareable cost) and clients want to connect to opened facilities. Each client pays a cost share and his non-shareable physical connection cost. Given any profile of subsets allocated by the agents, a separable cost sharing protocol determines cost shares that satisfy budget balance on every resource and separability over the resources. Moreover, a separable protocol guarantees existence of pure Nash equilibria in the induced strategic game for the agents. In this paper, we study separable cost sharing protocols in several general combinatorial domains. We provide black-box reductions to reduce the design of a separable cost-sharing protocol to the design of an approximation algorithm for the underlying cost minimization problem. In this way, we obtain new separable cost-sharing protocols in games based on arbitrary player-specific matroids, single-source connection games without delays, and connection games on $n$-series-parallel graphs with delays. All these reductions are efficiently computable - given an initial allocation profile, we obtain a cheaper profile and separable cost shares turning the profile into a pure Nash equilibrium. Hence, in these domains any approximation algorithm can be used to obtain a separable cost sharing protocol with a price of stability bounded by the approximation factor.
△ Less
Submitted 28 February, 2018;
originally announced February 2018.
-
Overcoming the vanishing gradient problem in plain recurrent networks
Authors:
Yuhuang Hu,
Adrian Huber,
Jithendar Anumula,
Shih-Chii Liu
Abstract:
Plain recurrent networks greatly suffer from the vanishing gradient problem while Gated Neural Networks (GNNs) such as Long-short Term Memory (LSTM) and Gated Recurrent Unit (GRU) deliver promising results in many sequence learning tasks through sophisticated network designs. This paper shows how we can address this problem in a plain recurrent network by analyzing the gating mechanisms in GNNs. W…
▽ More
Plain recurrent networks greatly suffer from the vanishing gradient problem while Gated Neural Networks (GNNs) such as Long-short Term Memory (LSTM) and Gated Recurrent Unit (GRU) deliver promising results in many sequence learning tasks through sophisticated network designs. This paper shows how we can address this problem in a plain recurrent network by analyzing the gating mechanisms in GNNs. We propose a novel network called the Recurrent Identity Network (RIN) which allows a plain recurrent network to overcome the vanishing gradient problem while training very deep models without the use of gates. We compare this model with IRNNs and LSTMs on multiple sequence modeling benchmarks. The RINs demonstrate competitive performance and converge faster in all tasks. Notably, small RIN models produce 12%--67% higher accuracy on the Sequential and Permuted MNIST datasets and reach state-of-the-art performance on the bAbI question answering dataset.
△ Less
Submitted 5 July, 2019; v1 submitted 18 January, 2018;
originally announced January 2018.
-
A Characterization of Undirected Graphs Admitting Optimal Cost Shares
Authors:
Tobias Harks,
Anja Huber,
Manuel Surek
Abstract:
In a seminal paper, Chen, Roughgarden and Valiant studied cost sharing protocols for network design with the objective to implement a low-cost Steiner forest as a Nash equilibrium of an induced cost-sharing game. One of the most intriguing open problems to date is to understand the power of budget-balanced and separable cost sharing protocols in order to induce low-cost Steiner forests. In this wo…
▽ More
In a seminal paper, Chen, Roughgarden and Valiant studied cost sharing protocols for network design with the objective to implement a low-cost Steiner forest as a Nash equilibrium of an induced cost-sharing game. One of the most intriguing open problems to date is to understand the power of budget-balanced and separable cost sharing protocols in order to induce low-cost Steiner forests. In this work, we focus on undirected networks and analyze topological properties of the underlying graph so that an optimal Steiner forest can be implemented as a Nash equilibrium (by some separable cost sharing protocol) independent of the edge costs. We term a graph efficient if the above stated property holds. As our main result, we give a complete characterization of efficient undirected graphs for two-player network design games: an undirected graph is efficient if and only if it does not contain (at least) one out of few forbidden subgraphs. Our characterization implies that several graph classes are efficient: generalized series-parallel graphs, fan and wheel graphs and graphs with small cycles.
△ Less
Submitted 4 October, 2017; v1 submitted 6 April, 2017;
originally announced April 2017.
-
User Experience of a Smart Factory Robot: Assembly Line Workers Demand Adaptive Robots
Authors:
Astrid Weiss,
Andreas Huber
Abstract:
This paper reports a case study on the User Experience (UX)of an industrial robotic prototype in the context of human-robot cooperation in an automotive assembly line. The goal was to find out what kinds of suggestions the assembly line workers, who actually use the new robotic system, propose in order to improve the human-robot interaction (HRI). The operators working with the robotic prototype w…
▽ More
This paper reports a case study on the User Experience (UX)of an industrial robotic prototype in the context of human-robot cooperation in an automotive assembly line. The goal was to find out what kinds of suggestions the assembly line workers, who actually use the new robotic system, propose in order to improve the human-robot interaction (HRI). The operators working with the robotic prototype were interviewed three weeks after the deployment using established UX narrative interview guidelines. Our results show that the cooperation with a robot that executes predefined working steps actually impedes the user in terms of flexibility and individual speed. This results in a change of working routine for the operators, impacts the UX, and potentially leads to a decrease in productivity. We present the results of the interviews as well as first thoughts on technical solutions in order to enhance the adaptivity and subsequently the UX of the human-robot cooperation.
△ Less
Submitted 13 June, 2016;
originally announced June 2016.
-
Towards Minimizing k-Submodular Functions
Authors:
Anna Huber,
Vladimir Kolmogorov
Abstract:
In this paper we investigate k-submodular functions. This natural family of discrete functions includes submodular and bisubmodular functions as the special cases k = 1 and k = 2 respectively.
In particular we generalize the known Min-Max-Theorem for submodular and bisubmodular functions. This theorem asserts that the minimum of the (bi)submodular function can be found by solving a maximization…
▽ More
In this paper we investigate k-submodular functions. This natural family of discrete functions includes submodular and bisubmodular functions as the special cases k = 1 and k = 2 respectively.
In particular we generalize the known Min-Max-Theorem for submodular and bisubmodular functions. This theorem asserts that the minimum of the (bi)submodular function can be found by solving a maximization problem over a (bi)submodular polyhedron. We define and investigate a k-submodular polyhedron and prove a Min-Max-Theorem for k-submodular functions.
△ Less
Submitted 21 September, 2013;
originally announced September 2013.
-
Oracle Tractability of Skew Bisubmodular Functions
Authors:
Anna Huber,
Andrei Krokhin
Abstract:
In this paper we consider skew bisubmodular functions as introduced in [9]. We construct a convex extension of a skew bisubmodular function which we call Lovász extension in correspondence to the submodular case. We use this extension to show that skew bisubmodular functions given by an oracle can be minimised in polynomial time.
In this paper we consider skew bisubmodular functions as introduced in [9]. We construct a convex extension of a skew bisubmodular function which we call Lovász extension in correspondence to the submodular case. We use this extension to show that skew bisubmodular functions given by an oracle can be minimised in polynomial time.
△ Less
Submitted 29 August, 2013;
originally announced August 2013.
-
Strong Robustness of Randomized Rumor Spreading Protocols
Authors:
Benjamin Doerr,
Anna Huber,
Ariel Levavi
Abstract:
Randomized rumor spreading is a classical protocol to disseminate information across a network. At SODA 2008, a quasirandom version of this protocol was proposed and competitive bounds for its run-time were proven. This prompts the question: to what extent does the quasirandom protocol inherit the second principal advantage of randomized rumor spreading, namely robustness against transmission fail…
▽ More
Randomized rumor spreading is a classical protocol to disseminate information across a network. At SODA 2008, a quasirandom version of this protocol was proposed and competitive bounds for its run-time were proven. This prompts the question: to what extent does the quasirandom protocol inherit the second principal advantage of randomized rumor spreading, namely robustness against transmission failures?
In this paper, we present a result precise up to $(1 \pm o(1))$ factors. We limit ourselves to the network in which every two vertices are connected by a direct link. Run-times accurate to their leading constants are unknown for all other non-trivial networks.
We show that if each transmission reaches its destination with a probability of $p \in (0,1]$, after $(1+\e)(\frac{1}{\log_2(1+p)}\log_2n+\frac{1}{p}\ln n)$ rounds the quasirandom protocol has informed all $n$ nodes in the network with probability at least $1-n^{-p\e/40}$. Note that this is faster than the intuitively natural $1/p$ factor increase over the run-time of approximately $\log_2 n + \ln n $ for the non-corrupted case.
We also provide a corresponding lower bound for the classical model. This demonstrates that the quasirandom model is at least as robust as the fully random model despite the greatly reduced degree of independent randomness.
△ Less
Submitted 3 October, 2012; v1 submitted 18 January, 2010;
originally announced January 2010.