-
Enumeration of Row-Column Designs
Authors:
Gerold Jäger,
Klas Markström,
Lars-Daniel Öhman,
Denys Shcherbak
Abstract:
We computationally completely enumerate a number of types of row-column designs up to isotopism, including double, sesqui and triple arrays as known from the literature, and two newly introduced types that we call mono arrays and AO-arrays. We calculate autotopism group sizes for the designs we generate. For larger parameter values, where complete enumeration is not feasible, we generate examples…
▽ More
We computationally completely enumerate a number of types of row-column designs up to isotopism, including double, sesqui and triple arrays as known from the literature, and two newly introduced types that we call mono arrays and AO-arrays. We calculate autotopism group sizes for the designs we generate. For larger parameter values, where complete enumeration is not feasible, we generate examples of some of the designs, and generate exhaustive lists of admissible parameters. For some admissible parameter sets, we prove non-existence results. We also give some explicit constructions of sesqui arrays, mono arrays and AO-arrays, and investigate connections to Youden rectangles and binary pseud Youden designs.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Using Unsupervised Learning to Explore Robot-Pedestrian Interactions in Urban Environments
Authors:
Sebastian Zug,
Georg Jäger,
Norman Seyffer,
Martin Plank,
Gero Licht,
Felix Wilhelm Siebert
Abstract:
This study identifies a gap in data-driven approaches to robot-centric pedestrian interactions and proposes a corresponding pipeline. The pipeline utilizes unsupervised learning techniques to identify patterns in interaction data of urban environments, specifically focusing on conflict scenarios. Analyzed features include the robot's and pedestrian's speed and contextual parameters such as proximi…
▽ More
This study identifies a gap in data-driven approaches to robot-centric pedestrian interactions and proposes a corresponding pipeline. The pipeline utilizes unsupervised learning techniques to identify patterns in interaction data of urban environments, specifically focusing on conflict scenarios. Analyzed features include the robot's and pedestrian's speed and contextual parameters such as proximity to intersections. They are extracted and reduced in dimensionality using Principal Component Analysis (PCA). Finally, K-means clustering is employed to uncover underlying patterns in the interaction data. A use case application of the pipeline is presented, utilizing real-world robot mission data from a mid-sized German city. The results indicate the need for enriching interaction representations with contextual information to enable fine-grained analysis and reasoning. Nevertheless, they also highlight the need for expanding the data set and incorporating additional contextual factors to enhance the robots situational awareness and interaction quality.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Computational Approaches for Integrating out Subjectivity in Cognate Synonym Selection
Authors:
Luise Häuser,
Gerhard Jäger,
Alexandros Stamatakis
Abstract:
Working with cognate data involves handling synonyms, that is, multiple words that describe the same concept in a language. In the early days of language phylogenetics it was recommended to select one synonym only. However, as we show here, binary character matrices, which are used as input for computational methods, do allow for representing the entire dataset including all synonyms. Here we addr…
▽ More
Working with cognate data involves handling synonyms, that is, multiple words that describe the same concept in a language. In the early days of language phylogenetics it was recommended to select one synonym only. However, as we show here, binary character matrices, which are used as input for computational methods, do allow for representing the entire dataset including all synonyms. Here we address the question how one can and if one should include all synonyms or whether it is preferable to select synonyms a priori. To this end, we perform maximum likelihood tree inferences with the widely used RAxML-NG tool and show that it yields plausible trees when all synonyms are used as input. Furthermore, we show that a priori synonym selection can yield topologically substantially different trees and we therefore advise against doing so. To represent cognate data including all synonyms, we introduce two types of character matrices beyond the standard binary ones: probabilistic binary and probabilistic multi-valued character matrices. We further show that it is dataset-dependent for which character matrix type the inferred RAxML-NG tree is topologically closest to the gold standard. We also make available a Python interface for generating all of the above character matrix types for cognate data provided in CLDF format.
△ Less
Submitted 5 June, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Extending the definition of set tolerances
Authors:
Gerold Jäger,
Marcel Turkensteen
Abstract:
Optimal solutions of combinatorial optimization problems can be sensitive to changes in the cost of one or more elements. Single and set tolerances measure the largest / smallest possible change such that the current solution remains optimal and other solutions become non-optimal for cost changes in one or more elements, respectively. The current definition only applies to subsets of elements. In…
▽ More
Optimal solutions of combinatorial optimization problems can be sensitive to changes in the cost of one or more elements. Single and set tolerances measure the largest / smallest possible change such that the current solution remains optimal and other solutions become non-optimal for cost changes in one or more elements, respectively. The current definition only applies to subsets of elements. In this paper, we broaden the definition to all elements, for single tolerances, and to all subsets of elements for set tolerances, while proving that key computational and theoretical properties still apply to the new definitions.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Are Sounds Sound for Phylogenetic Reconstruction?
Authors:
Luise Häuser,
Gerhard Jäger,
Taraka Rama,
Johann-Mattis List,
Alexandros Stamatakis
Abstract:
In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistic…
▽ More
In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.
△ Less
Submitted 14 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Expanding Accurate Person Recognition to New Altitudes and Ranges: The BRIAR Dataset
Authors:
David Cornett III,
Joel Brogan,
Nell Barber,
Deniz Aykac,
Seth Baird,
Nick Burchfield,
Carl Dukes,
Andrew Duncan,
Regina Ferrell,
Jim Goddard,
Gavin Jager,
Matt Larson,
Bart Murphy,
Christi Johnson,
Ian Shelley,
Nisha Srinivas,
Brandon Stockwell,
Leanne Thompson,
Matt Yohe,
Robert Zhang,
Scott Dolvin,
Hector J. Santos-Villalobos,
David S. Bolme
Abstract:
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These app…
▽ More
Face recognition technology has advanced significantly in recent years due largely to the availability of large and increasingly complex training datasets for use in deep learning models. These datasets, however, typically comprise images scraped from news sites or social media platforms and, therefore, have limited utility in more advanced security, forensics, and military applications. These applications require lower resolution, longer ranges, and elevated viewpoints. To meet these critical needs, we collected and curated the first and second subsets of a large multi-modal biometric dataset designed for use in the research and development (R&D) of biometric recognition technologies under extremely challenging conditions. Thus far, the dataset includes more than 350,000 still images and over 1,300 hours of video footage of approximately 1,000 subjects. To collect this data, we used Nikon DSLR cameras, a variety of commercial surveillance cameras, specialized long-rage R&D cameras, and Group 1 and Group 2 UAV platforms. The goal is to support the development of algorithms capable of accurately recognizing people at ranges up to 1,000 m and from high angles of elevation. These advances will include improvements to the state of the art in face recognition and will support new research in the area of whole-body recognition using methods based on gait and anthropometry. This paper describes methods used to collect and curate the dataset, and the dataset's characteristics at the current stage.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Can We Replicate Real Human Behaviour Using Artificial Neural Networks?
Authors:
Georg Jäger,
Daniel Reisinger
Abstract:
Agent-based modelling is a powerful tool when simulating human systems, yet when human behaviour cannot be described by simple rules or maximising one's own profit, we quickly reach the limits of this methodology. Machine learning has the potential to bridge this gap by providing a link between what people observe and how they act in order to reach their goal. In this paper we use a framework for…
▽ More
Agent-based modelling is a powerful tool when simulating human systems, yet when human behaviour cannot be described by simple rules or maximising one's own profit, we quickly reach the limits of this methodology. Machine learning has the potential to bridge this gap by providing a link between what people observe and how they act in order to reach their goal. In this paper we use a framework for agent-based modelling that utilizes human values like fairness, conformity and altruism. Using this framework we simulate a public goods game and compare to experimental results. We can report good agreement between simulation and experiment and furthermore find that the presented framework outperforms strict reinforcement learning. Both the framework and the utility function are generic enough that they can be used for arbitrary systems, which makes this method a promising candidate for a foundation of a universal agent-based model.
△ Less
Submitted 20 January, 2022; v1 submitted 9 July, 2021;
originally announced July 2021.
-
Phylogenetic typology
Authors:
Gerhard Jäger,
Johannes Wahle
Abstract:
In this article we propose a novel method to estimate the frequency distribution of linguistic variables while controlling for statistical non-independence due to shared ancestry. Unlike previous approaches, our technique uses all available data, from language families large and small as well as from isolates, while controlling for different degrees of relatedness on a continuous scale estimated f…
▽ More
In this article we propose a novel method to estimate the frequency distribution of linguistic variables while controlling for statistical non-independence due to shared ancestry. Unlike previous approaches, our technique uses all available data, from language families large and small as well as from isolates, while controlling for different degrees of relatedness on a continuous scale estimated from the data. Our approach involves three steps: First, distributions of phylogenies are inferred from lexical data. Second, these phylogenies are used as part of a statistical model to statistically estimate transition rates between parameter states. Finally, the long-term equilibrium of the resulting Markov process is computed. As a case study, we investigate a series of potential word-order correlations across the languages of the world.
△ Less
Submitted 19 March, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
The Mertens Unrolled Network (MU-Net): A High Dynamic Range Fusion Neural Network for Through the Windshield Driver Recognition
Authors:
Max Ruby,
David S. Bolme,
Joel Brogan,
David Cornett III,
Baldemar Delgado,
Gavin Jager,
Christi Johnson,
Jose Martinez-Mendoza,
Hector Santos-Villalobos,
Nisha Srinivas
Abstract:
Face recognition of vehicle occupants through windshields in unconstrained environments poses a number of unique challenges ranging from glare, poor illumination, driver pose and motion blur. In this paper, we further develop the hardware and software components of a custom vehicle imaging system to better overcome these challenges. After the build out of a physical prototype system that performs…
▽ More
Face recognition of vehicle occupants through windshields in unconstrained environments poses a number of unique challenges ranging from glare, poor illumination, driver pose and motion blur. In this paper, we further develop the hardware and software components of a custom vehicle imaging system to better overcome these challenges. After the build out of a physical prototype system that performs High Dynamic Range (HDR) imaging, we collect a small dataset of through-windshield image captures of known drivers. We then re-formulate the classical Mertens-Kautz-Van Reeth HDR fusion algorithm as a pre-initialized neural network, which we name the Mertens Unrolled Network (MU-Net), for the purpose of fine-tuning the HDR output of through-windshield images. Reconstructed faces from this novel HDR method are then evaluated and compared against other traditional and experimental HDR methods in a pre-trained state-of-the-art (SOTA) facial recognition pipeline, verifying the efficacy of our approach.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Enumeration of Sets of Mutually Orthogonal Latin Rectangles
Authors:
Gerold Jäger,
Klas Markström,
Denys Shcherbak,
Lars-Daniel Öhman
Abstract:
We study sets of mutually orthogonal Latin rectangles (MOLR), and a natural variation of the concept of self-orthogonal Latin squares which is applicable on larger sets of mutually orthogonal Latin squares and MOLR, namely that each Latin rectangle in a set of MOLR is isotopic to each other rectangle in the set. We call such a set of MOLR \emph{homogeneous}. In the course of doing this, we perform…
▽ More
We study sets of mutually orthogonal Latin rectangles (MOLR), and a natural variation of the concept of self-orthogonal Latin squares which is applicable on larger sets of mutually orthogonal Latin squares and MOLR, namely that each Latin rectangle in a set of MOLR is isotopic to each other rectangle in the set. We call such a set of MOLR \emph{homogeneous}. In the course of doing this, we perform a complete enumeration of non-isotopic sets of $t$ mutually orthogonal $k\times n$ Latin rectangles for $k\leq n \leq 7$, for all $t < n$. Specifically, we keep track of homogeneous sets of MOLR, as well as sets of MOLR where the autotopism group acts transitively on the rectangles, and we call such sets of MOLR \emph{transitive}. We build the sets of MOLR row by row, and in this process we also keep track of which of the MOLR are homogeneous and/or transitive in each step of the construction process. We use the prefix \emph{stepwise} to refer to sets of MOLR with this property. Sets of MOLR are connected to other discrete objects, notably finite geometries and certain regular graphs. Here we observe that all projective planes of order at most 9 except the Hughes plane can be constructed from a stepwise transitive MOLR.
△ Less
Submitted 19 January, 2024; v1 submitted 7 October, 2019;
originally announced October 2019.
-
Small Youden Rectangles, Near Youden Rectangles, and Their Connections to Other Row-Column Designs
Authors:
Gerold Jäger,
Klas Markström,
Denys Shcherbak,
Lars-Daniel Öhman
Abstract:
In this paper we first study $k \times n$ Youden rectangles of small orders. We have enumerated all Youden rectangles for a range of small parameter values, excluding the almost square cases where $k = n-1$, in a large scale computer search. In particular, we verify the previous counts for $(n,k) = (7,3), (7,4)$, and extend this to the cases $(11,5), (11,6), (13,4)$ and $(21,5)$. For small paramet…
▽ More
In this paper we first study $k \times n$ Youden rectangles of small orders. We have enumerated all Youden rectangles for a range of small parameter values, excluding the almost square cases where $k = n-1$, in a large scale computer search. In particular, we verify the previous counts for $(n,k) = (7,3), (7,4)$, and extend this to the cases $(11,5), (11,6), (13,4)$ and $(21,5)$. For small parameter values where no Youden rectangles exist, we also enumerate rectangles where the number of symbols common to two columns is always one of two possible values, differing by 1, which we call \emph{near Youden rectangles}. For all the designs we generate, we calculate the order of the autotopism group and investigate to which degree a certain transformation can yield other row-column designs, namely double arrays, triple arrays and sesqui arrays. Finally, we also investigate certain Latin rectangles with three possible pairwise intersection sizes for the columns and demonstrate that these can give rise to triple and sesqui arrays which cannot be obtained from Youden rectangles, using the transformation mentioned above.
△ Less
Submitted 27 February, 2023; v1 submitted 7 October, 2019;
originally announced October 2019.
-
From Multi-modal Property Dataset to Robot-centric Conceptual Knowledge About Household Objects
Authors:
Madhura Thosar,
Christian A. Mueller,
Georg Jaeger,
Johannes Schleiss,
Narender Pulugu,
Ravi Mallikarjun Chennaboina,
Sai Vivek Jeevangekar,
Andreas Birk,
Max Pfingsthorn,
Sebastian Zug
Abstract:
Tool-use applications in robotics require conceptual knowledge about objects for informed decision making and object interactions. State-of-the-art methods employ hand-crafted symbolic knowledge which is defined from a human perspective and grounded into sensory data afterwards. However, due to different sensing and acting capabilities of robots, their conceptual understanding of objects must be g…
▽ More
Tool-use applications in robotics require conceptual knowledge about objects for informed decision making and object interactions. State-of-the-art methods employ hand-crafted symbolic knowledge which is defined from a human perspective and grounded into sensory data afterwards. However, due to different sensing and acting capabilities of robots, their conceptual understanding of objects must be generated from a robot's perspective entirely, which asks for robot-centric conceptual knowledge about objects. With this goal in mind, this article motivates that such knowledge should be based on physical and functional properties of objects. Consequently, a selection of ten properties is defined and corresponding extraction methods are proposed. This multi-modal property extraction forms the basis on which our second contribution, a robot-centric knowledge generation is build on. It employs unsupervised clustering methods to transform numerical property data into symbols, and Bivariate Joint Frequency Distributions and Sample Proportion to generate conceptual knowledge about objects using the robot-centric symbols. A preliminary implementation of the proposed framework is employed to acquire a dataset comprising physical and functional property data of 110 houshold objects. This Robot-Centric dataSet (RoCS) is used to evaluate the framework regarding the property extraction methods, the semantics of the considered properties within the dataset and its usefulness in real-world applications such as tool substitution.
△ Less
Submitted 26 June, 2019;
originally announced June 2019.
-
Triples of Orthogonal Latin and Youden Rectangles For Small Orders
Authors:
Gerold Jäger,
Klas Markström,
Lars-Daniel Öhman,
Denys Shcherbak
Abstract:
We have performed a complete enumeration of non-isotopic triples of mutually orthogonal $k\times n$ Latin rectangles for $k\leq n \leq 7$. Here we will present a census of such triples, classified by various properties, including the order of the autotopism group of the triple. As part of this we have also achieved the first enumeration of pairwise orthogonal triples of Youden rectangles. We have…
▽ More
We have performed a complete enumeration of non-isotopic triples of mutually orthogonal $k\times n$ Latin rectangles for $k\leq n \leq 7$. Here we will present a census of such triples, classified by various properties, including the order of the autotopism group of the triple. As part of this we have also achieved the first enumeration of pairwise orthogonal triples of Youden rectangles. We have also studied orthogonal triples of $k \times 8$ rectangles which are formed by extending mutually orthogonal triples with non-trivial autotopisms one row at a time, and requiring that the autotopism group is non-trivial in each step. This class includes a triple coming from the projective plane of order 8. Here we find a remarkably symmetrical pair of triples of $4 \times 8$ rectangles, formed by juxtaposing two selected copies of complete sets of MOLS of order 4.
△ Less
Submitted 30 October, 2018;
originally announced October 2018.
-
Towards Robot-Centric Conceptual Knowledge Acquisition
Authors:
Georg Jäger,
Christian A. Mueller,
Madhura Thosar,
Sebastian Zug,
Andreas Birk
Abstract:
Robots require knowledge about objects in order to efficiently perform various household tasks involving objects. The existing knowledge bases for robots acquire symbolic knowledge about objects from manually-coded external common sense knowledge bases such as ConceptNet, Word-Net etc. The problem with such approaches is the discrepancy between human-centric symbolic knowledge and robot-centric ob…
▽ More
Robots require knowledge about objects in order to efficiently perform various household tasks involving objects. The existing knowledge bases for robots acquire symbolic knowledge about objects from manually-coded external common sense knowledge bases such as ConceptNet, Word-Net etc. The problem with such approaches is the discrepancy between human-centric symbolic knowledge and robot-centric object perception due to its limited perception capabilities. Ultimately, significant portion of knowledge in the knowledge base remains ungrounded into robot's perception. To overcome this discrepancy, we propose an approach to enable robots to generate robot-centric symbolic knowledge about objects from their own sensory data, thus, allowing them to assemble their own conceptual understanding of objects. With this goal in mind, the presented paper elaborates on the work-in-progress of the proposed approach followed by the preliminary results.
△ Less
Submitted 8 October, 2018;
originally announced October 2018.
-
Computational Historical Linguistics
Authors:
Gerhard Jäger
Abstract:
Computational approaches to historical linguistics have been proposed since half a century. Within the last decade, this line of research has received a major boost, owing both to the transfer of ideas and software from computational biology and to the release of several large electronic data resources suitable for systematic comparative work.
In this article, some of the central research topic…
▽ More
Computational approaches to historical linguistics have been proposed since half a century. Within the last decade, this line of research has received a major boost, owing both to the transfer of ideas and software from computational biology and to the release of several large electronic data resources suitable for systematic comparative work.
In this article, some of the central research topic of this new wave of computational historical linguistics are introduced and discussed. These are automatic assessment of genetic relatedness, automatic cognate detection, phylogenetic inference and ancestral state reconstruction. They will be demonstrated by means of a case study of automatically reconstructing a Proto-Romance word list from lexical data of 50 modern Romance languages and dialects.
△ Less
Submitted 21 May, 2018;
originally announced May 2018.
-
Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?
Authors:
Taraka Rama,
Johann-Mattis List,
Johannes Wahle,
Gerhard Jäger
Abstract:
We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets. Our findings suggest that phylogenies inferred from automated cognate sets come close to phylogenies inferred from expert-annotated ones, although on avera…
▽ More
We evaluate the performance of state-of-the-art algorithms for automatic cognate detection by comparing how useful automatically inferred cognates are for the task of phylogenetic inference compared to classical manually annotated cognate sets. Our findings suggest that phylogenies inferred from automated cognate sets come close to phylogenies inferred from expert-annotated ones, although on average, the latter are still superior. We conclude that future work on phylogenetic reconstruction can profit much from automatic cognate detection. Especially where scholars are merely interested in exploring the bigger picture of a language family's phylogeny, algorithms for automatic cognate detection are a useful complement for current research on language phylogenies.
△ Less
Submitted 15 April, 2018;
originally announced April 2018.
-
Global-scale phylogenetic linguistic inference from lexical resources
Authors:
Gerhard Jäger
Abstract:
Automatic phylogenetic inference plays an increasingly important role in computational historical linguistics. Most pertinent work is currently based on expert cognate judgments. This limits the scope of this approach to a small number of well-studied language families. We used machine learning techniques to compile data suitable for phylogenetic inference from the ASJP database, a collection of a…
▽ More
Automatic phylogenetic inference plays an increasingly important role in computational historical linguistics. Most pertinent work is currently based on expert cognate judgments. This limits the scope of this approach to a small number of well-studied language families. We used machine learning techniques to compile data suitable for phylogenetic inference from the ASJP database, a collection of almost 7,000 phonetically transcribed word lists over 40 concepts, covering two third of the extant world-wide linguistic diversity. First, we estimated Pointwise Mutual Information scores between sound classes using weighted sequence alignment and general-purpose optimization. From this we computed a dissimilarity matrix over all ASJP word lists. This matrix is suitable for distance-based phylogenetic inference. Second, we applied cognate clustering to the ASJP data, using supervised training of an SVM classifier on expert cognacy judgments. Third, we defined two types of binary characters, based on automatically inferred cognate classes and on sound-class occurrences. Several tests are reported demonstrating the suitability of these characters for character-based phylogenetic inference.
△ Less
Submitted 17 February, 2018;
originally announced February 2018.
-
Fast and unsupervised methods for multilingual cognate clustering
Authors:
Taraka Rama,
Johannes Wahle,
Pavel Sofroniev,
Gerhard Jäger
Abstract:
In this paper we explore the use of unsupervised methods for detecting cognates in multilingual word lists. We use online EM to train sound segment similarity weights for computing similarity between two words. We tested our online systems on geographically spread sixteen different language groups of the world and show that the Online PMI system (Pointwise Mutual Information) outperforms a HMM bas…
▽ More
In this paper we explore the use of unsupervised methods for detecting cognates in multilingual word lists. We use online EM to train sound segment similarity weights for computing similarity between two words. We tested our online systems on geographically spread sixteen different language groups of the world and show that the Online PMI system (Pointwise Mutual Information) outperforms a HMM based system and two linguistically motivated systems: LexStat and ALINE. Our results suggest that a PMI system trained in an online fashion can be used by historical linguists for fast and accurate identification of cognates in not so well-studied language families.
△ Less
Submitted 16 February, 2017;
originally announced February 2017.
-
inPHAP: Interactive visualization of genotype and phased haplotype data
Authors:
Günter Jäger,
Alexander Peltzer,
Kay Nieselt
Abstract:
Background: To understand individual genomes it is necessary to look at the variations that lead to changes in phenotype and possibly to disease. However, genotype information alone is often not sufficient and additional knowledge regarding the phase of the variation is needed to make correct interpretations. Interactive visualizations, that allow the user to explore the data in various ways, can…
▽ More
Background: To understand individual genomes it is necessary to look at the variations that lead to changes in phenotype and possibly to disease. However, genotype information alone is often not sufficient and additional knowledge regarding the phase of the variation is needed to make correct interpretations. Interactive visualizations, that allow the user to explore the data in various ways, can be of great assistance in the process of making well informed decisions. But, currently there is a lack for visualizations that are able to deal with phased haplotype data. Results: We present inPHAP, an interactive visualization tool for genotype and phased haplotype data. inPHAP features a variety of interaction possibilities such as zooming, sorting, filtering and aggregation of rows in order to explore patterns hidden in large genetic data sets. As a proof of concept, we apply inPHAP to the phased haplotype data set of Phase 1 of the 1000 Genomes Project. Thereby, inPHAP's ability to show genetic variations on the population as well as on the individuals level is demonstrated for several disease related loci. Conclusions: As of today, inPHAP is the only visual analytical tool that allows the user to explore unphased and phased haplotype data interactively. Due to its highly scalable design, inPHAP can be applied to large datasets with up to 100 GB of data, enabling users to visualize even large scale input data. inPHAP closes the gap between common visualization tools for unphased genotype data and introduces several new features, such as the visualization of phased data.
△ Less
Submitted 8 July, 2014;
originally announced July 2014.
-
An Effective Algorithm for and Phase Transitions of the Directed Hamiltonian Cycle Problem
Authors:
Gerold Jäger,
Weixiong Zhang
Abstract:
The Hamiltonian cycle problem (HCP) is an important combinatorial problem with applications in many areas. It is among the first problems used for studying intrinsic properties, including phase transitions, of combinatorial problems. While thorough theoretical and experimental analyses have been made on the HCP in undirected graphs, a limited amount of work has been done for the HCP in directed…
▽ More
The Hamiltonian cycle problem (HCP) is an important combinatorial problem with applications in many areas. It is among the first problems used for studying intrinsic properties, including phase transitions, of combinatorial problems. While thorough theoretical and experimental analyses have been made on the HCP in undirected graphs, a limited amount of work has been done for the HCP in directed graphs (DHCP).
The main contribution of this work is an effective algorithm for the DHCP. Our algorithm explores and exploits the close relationship between the DHCP and the Assignment Problem (AP) and utilizes a technique based on Boolean satisfiability (SAT). By combining effective algorithms for the AP and SAT, our algorithm significantly outperforms previous exact DHCP algorithms, including an algorithm based on the award-winning Concorde TSP algorithm. The second result of the current study is an experimental analysis of phase transitions of the DHCP, verifying and refining a known phase transition of the DHCP.
△ Less
Submitted 16 January, 2014;
originally announced January 2014.
-
The Worst Case Number of Questions in Generalized AB Game with and without White-peg Answers
Authors:
Gerold Jäger,
Marcin Peczarski
Abstract:
The AB game is a two-player game, where the codemaker has to choose a secret code and the codebreaker has to guess it in as few questions as possible. It is a variant of the famous Mastermind game, with the only difference that all pegs in both, the secret and the questions must have distinct colors. In this work, we consider the Generalized AB game, where for given arbitrary numbers $p$, $c$ with…
▽ More
The AB game is a two-player game, where the codemaker has to choose a secret code and the codebreaker has to guess it in as few questions as possible. It is a variant of the famous Mastermind game, with the only difference that all pegs in both, the secret and the questions must have distinct colors. In this work, we consider the Generalized AB game, where for given arbitrary numbers $p$, $c$ with $p \le c$ the secret code consists of $p$ pegs each having one of $c$ colors and the answer consists only of a number of black and white pegs. There the number of black pegs equals the number of pegs matching in the corresponding question and the secret in position and color, and the number of white pegs equals the additional number of pegs matching in the corresponding question and the secret only in color. We consider also a variant of the Generalized AB game, where the information of white pegs is omitted. This variant is called Generalized Black-peg AB game. Let $\ab(p,c)$ and $\abb(p,c)$ be the worst case number of questions for Generalized AB game and Generalized Black-peg AB game, respectively. Combining a computer program with theoretical considerations, we confirm known exact values of $\ab(2,c)$ and $\ab(3,c)$ and prove tight bounds for $\ab(4,c)$. Furthermore, we present exact values for $\abb(2,c)$ and $\abb(3,c)$ and tight bounds for $\abb(4,c)$.
△ Less
Submitted 7 June, 2013;
originally announced June 2013.