-
Neural Acceleration of Incomplete Cholesky Preconditioners
Authors:
Joshua Dennis Booth,
Hongyang Sun,
Trevor Garnett
Abstract:
The solution of a sparse system of linear equations is ubiquitous in scientific applications. Iterative methods, such as the Preconditioned Conjugate Gradient method (PCG), are normally chosen over direct methods due to memory and computational complexity constraints. However, the efficiency of these methods depends on the preconditioner utilized. The development of the preconditioner normally req…
▽ More
The solution of a sparse system of linear equations is ubiquitous in scientific applications. Iterative methods, such as the Preconditioned Conjugate Gradient method (PCG), are normally chosen over direct methods due to memory and computational complexity constraints. However, the efficiency of these methods depends on the preconditioner utilized. The development of the preconditioner normally requires some insight into the sparse linear system and the desired trade-off of generating the preconditioner and the reduction in the number of iterations. Incomplete factorization methods tend to be black box methods to generate these preconditioners but may fail for a number of reasons. These reasons include numerical issues that require searching for adequate scaling, shifting, and fill-in while utilizing a difficult to parallelize algorithm. With a move towards heterogeneous computing, many sparse applications find GPUs that are optimized for dense tensor applications like training neural networks being underutilized. In this work, we demonstrate that a simple artificial neural network trained either at compile time or in parallel to the running application on a GPU can provide an incomplete sparse Cholesky factorization that can be used as a preconditioner. This generated preconditioner is as good or better in terms of reduction of iterations than the one found using multiple preconditioning techniques such as scaling and shifting. Moreover, the generated method also works and never fails to produce a preconditioner that does not reduce the iteration count.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Symbolic Learning for Material Discovery
Authors:
Daniel Cunnington,
Flaviu Cipcigan,
Rodrigo Neumann Barros Ferreira,
Jonathan Booth
Abstract:
Discovering new materials is essential to solve challenges in climate change, sustainability and healthcare. A typical task in materials discovery is to search for a material in a database which maximises the value of a function. That function is often expensive to evaluate, and can rely upon a simulation or an experiment. Here, we introduce SyMDis, a sample efficient optimisation method based on…
▽ More
Discovering new materials is essential to solve challenges in climate change, sustainability and healthcare. A typical task in materials discovery is to search for a material in a database which maximises the value of a function. That function is often expensive to evaluate, and can rely upon a simulation or an experiment. Here, we introduce SyMDis, a sample efficient optimisation method based on symbolic learning, that discovers near-optimal materials in a large database. SyMDis performs comparably to a state-of-the-art optimiser, whilst learning interpretable rules to aid physical and chemical verification. Furthermore, the rules learned by SyMDis generalise to unseen datasets and return high performing candidates in a zero-shot evaluation, which is difficult to achieve with other approaches.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology
Authors:
Odhran O'Donoghue,
Aleksandar Shtedritski,
John Ginger,
Ralph Abboud,
Ali Essa Ghareeb,
Justin Booth,
Samuel G Rodriques
Abstract:
The ability to automatically generate accurate protocols for scientific experiments would represent a major step towards the automation of science. Large Language Models (LLMs) have impressive capabilities on a wide range of tasks, such as question answering and the generation of coherent text and code. However, LLMs can struggle with multi-step problems and long-term planning, which are crucial f…
▽ More
The ability to automatically generate accurate protocols for scientific experiments would represent a major step towards the automation of science. Large Language Models (LLMs) have impressive capabilities on a wide range of tasks, such as question answering and the generation of coherent text and code. However, LLMs can struggle with multi-step problems and long-term planning, which are crucial for designing scientific experiments. Moreover, evaluation of the accuracy of scientific protocols is challenging, because experiments can be described correctly in many different ways, require expert knowledge to evaluate, and cannot usually be executed automatically. Here we present an automatic evaluation framework for the task of planning experimental protocols, and we introduce BioProt: a dataset of biology protocols with corresponding pseudocode representations. To measure performance on generating scientific protocols, we use an LLM to convert a natural language protocol into pseudocode, and then evaluate an LLM's ability to reconstruct the pseudocode from a high-level description and a list of admissible pseudocode functions. We evaluate GPT-3 and GPT-4 on this task and explore their robustness. We externally validate the utility of pseudocode representations of text by generating accurate novel protocols using retrieved pseudocode, and we run a generated protocol successfully in our biological laboratory. Our framework is extensible to the evaluation and improvement of language model planning abilities in other areas of science or other areas that lack automatic evaluation.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Discovery of Novel Reticular Materials for Carbon Dioxide Capture using GFlowNets
Authors:
Flaviu Cipcigan,
Jonathan Booth,
Rodrigo Neumann Barros Ferreira,
Carine Ribeiro dos Santos,
Mathias Steiner
Abstract:
Artificial intelligence holds promise to improve materials discovery. GFlowNets are an emerging deep learning algorithm with many applications in AI-assisted discovery. By using GFlowNets, we generate porous reticular materials, such as metal organic frameworks and covalent organic frameworks, for applications in carbon dioxide capture. We introduce a new Python package (matgfn) to train and sampl…
▽ More
Artificial intelligence holds promise to improve materials discovery. GFlowNets are an emerging deep learning algorithm with many applications in AI-assisted discovery. By using GFlowNets, we generate porous reticular materials, such as metal organic frameworks and covalent organic frameworks, for applications in carbon dioxide capture. We introduce a new Python package (matgfn) to train and sample GFlowNets. We use matgfn to generate the matgfn-rm dataset of novel and diverse reticular materials with gravimetric surface area above 5000 m$^2$/g. We calculate single- and two-component gas adsorption isotherms for the top-100 candidates in matgfn-rm. These candidates are novel compared to the state-of-art ARC-MOF dataset and rank in the 90th percentile in terms of working capacity compared to the CoRE2019 dataset. We discover 15 materials outperforming all materials in CoRE2019.
△ Less
Submitted 16 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
DarSIA: An open-source Python toolbox for two-scale image processing of dynamics in porous media
Authors:
Jan Martin Nordbotten,
Benyamine Benali,
Jakub Wiktor Both,
Bergit Brattekås,
Erlend Storvik,
Martin A. Fernø
Abstract:
Understanding porous media flow is inherently a multi-scale challenge, where at the core lies the aggregation of pore-level processes to a continuum, or Darcy-scale, description. This challenge is directly mirrored in image processing, where grains and interfaces may be clearly visible, yet continuous parameters are desirable to measure. Classical image processing is poorly adapted to this setting…
▽ More
Understanding porous media flow is inherently a multi-scale challenge, where at the core lies the aggregation of pore-level processes to a continuum, or Darcy-scale, description. This challenge is directly mirrored in image processing, where grains and interfaces may be clearly visible, yet continuous parameters are desirable to measure. Classical image processing is poorly adapted to this setting, as most techniques do not explicitly utilize the fact that the image contains explicit physical processes.
Here, we adapt classical image processing concepts to what we define as physical images of porous materials and processes within them. This is realized through the development of a new open-source image analysis toolbox specifically adapted to time-series of images of porous materials.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine
Authors:
Kun Wang,
William R. Johnson III,
Shiyang Lu,
Xiaonan Huang,
Joran Booth,
Rebecca Kramer-Bottiglio,
Mridul Aanjaneya,
Kostas Bekris
Abstract:
Tensegrity robots, composed of rigid rods and flexible cables, exhibit high strength-to-weight ratios and significant deformations, which enable them to navigate unstructured terrains and survive harsh impacts. They are hard to control, however, due to high dimensionality, complex dynamics, and a coupled architecture. Physics-based simulation is a promising avenue for develo** locomotion policie…
▽ More
Tensegrity robots, composed of rigid rods and flexible cables, exhibit high strength-to-weight ratios and significant deformations, which enable them to navigate unstructured terrains and survive harsh impacts. They are hard to control, however, due to high dimensionality, complex dynamics, and a coupled architecture. Physics-based simulation is a promising avenue for develo** locomotion policies that can be transferred to real robots. Nevertheless, modeling tensegrity robots is a complex task due to a substantial sim2real gap. To address this issue, this paper describes a Real2Sim2Real (R2S2R) strategy for tensegrity robots. This strategy is based on a differentiable physics engine that can be trained given limited data from a real robot. These data include offline measurements of physical properties, such as mass and geometry for various robot components, and the observation of a trajectory using a random control policy. With the data from the real robot, the engine can be iteratively refined and used to discover locomotion policies that are directly transferable to the real robot. Beyond the R2S2R pipeline, key contributions of this work include computing non-zero gradients at contact points, a loss function for matching tensegrity locomotion gaits, and a trajectory segmentation technique that avoids conflicts in gradient evaluation during training. Multiple iterations of the R2S2R process are demonstrated and evaluated on a real 3-bar tensegrity robot.
△ Less
Submitted 17 September, 2023; v1 submitted 13 September, 2022;
originally announced September 2022.
-
6N-DoF Pose Tracking for Tensegrity Robots
Authors:
Shiyang Lu,
William R. Johnson III,
Kun Wang,
Xiaonan Huang,
Joran Booth,
Rebecca Kramer-Bottiglio,
Kostas Bekris
Abstract:
Tensegrity robots, which are composed of compressive elements (rods) and flexible tensile elements (e.g., cables), have a variety of advantages, including flexibility, low weight, and resistance to mechanical impact. Nevertheless, the hybrid soft-rigid nature of these robots also complicates the ability to localize and track their state. This work aims to address what has been recognized as a gran…
▽ More
Tensegrity robots, which are composed of compressive elements (rods) and flexible tensile elements (e.g., cables), have a variety of advantages, including flexibility, low weight, and resistance to mechanical impact. Nevertheless, the hybrid soft-rigid nature of these robots also complicates the ability to localize and track their state. This work aims to address what has been recognized as a grand challenge in this domain, i.e., the state estimation of tensegrity robots through a markerless, vision-based method, as well as novel, onboard sensors that can measure the length of the robot's cables. In particular, an iterative optimization process is proposed to track the 6-DoF pose of each rigid element of a tensegrity robot from an RGB-D video as well as endcap distance measurements from the cable sensors. To ensure that the pose estimates of rigid elements are physically feasible, i.e., they are not resulting in collisions between rods or with the environment, physical constraints are introduced during the optimization. Real-world experiments are performed with a 3-bar tensegrity robot, which performs locomotion gaits. Given ground truth data from a motion capture system, the proposed method achieves less than 1~cm translation error and 3 degrees rotation error, which significantly outperforms alternatives. At the same time, the approach can provide accurate pose estimation throughout the robot's motion, while motion capture often fails due to occlusions.
△ Less
Submitted 13 October, 2022; v1 submitted 29 May, 2022;
originally announced May 2022.
-
Heterogeneous Sparse Matrix-Vector Multiplication via Compressed Sparse Row Format
Authors:
Phillip Allen Lane,
Joshua Dennis Booth
Abstract:
Sparse matrix-vector multiplication (SpMV) is one of the most important kernels in high-performance computing (HPC), yet SpMV normally suffers from ill performance on many devices. Due to ill performance, SpMV normally requires special care to store and tune for a given device. Moreover, HPC is facing heterogeneous hardware containing multiple different compute units, e.g., many-core CPUs and GPUs…
▽ More
Sparse matrix-vector multiplication (SpMV) is one of the most important kernels in high-performance computing (HPC), yet SpMV normally suffers from ill performance on many devices. Due to ill performance, SpMV normally requires special care to store and tune for a given device. Moreover, HPC is facing heterogeneous hardware containing multiple different compute units, e.g., many-core CPUs and GPUs. Therefore, an emerging goal has been to produce heterogeneous formats and methods that allow critical kernels, e.g., SpMV, to be executed on different devices with portable performance and minimal changes to format and method. This paper presents a heterogeneous format based on CSR, named CSR-k, that can be tuned quickly and outperforms the average performance of Intel MKL on Intel Xeon Platinum 8380 and AMD Epyc 7742 CPUs while still outperforming NVIDIA's cuSPARSE and Sandia National Laboratories' KokkosKernels on NVIDIA A100 and V100 for regular sparse matrices, i.e., sparse matrices where the number of nonzeros per row has a variance $\leq$ 10, such as those commonly generated from two and three-dimensional finite difference and element problems. In particular, CSR-k achieves this with reordering and by grou** rows into a hierarchical structure of super-rows and super-super-rows that are represented by just a few extra arrays of pointers. Due to its simplicity, a model can be tuned for a device and used to select super-row and super-super-rows sizes in constant time.
△ Less
Submitted 6 January, 2023; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Soft Lattice Modules that Behave Independently and Collectively
Authors:
Luyang Zhao,
Yijia Wu,
Julien Blanchet,
Maxine Perroni-Scharf,
Xiaonan Huang,
Joran Booth,
Rebecca Kramer-Bottiglio,
Devin Balkcom
Abstract:
Natural systems integrate the work of many sub-units (cells) toward a large-scale unified goal (morphological and behavioral), which can counteract the effects of unexpected experiences, damage, or simply changes in tasks demands. In this paper, we exploit the opportunities presented by soft, modular, and tensegrity robots to introduce soft lattice modules that parallel the sub-units seen in biolo…
▽ More
Natural systems integrate the work of many sub-units (cells) toward a large-scale unified goal (morphological and behavioral), which can counteract the effects of unexpected experiences, damage, or simply changes in tasks demands. In this paper, we exploit the opportunities presented by soft, modular, and tensegrity robots to introduce soft lattice modules that parallel the sub-units seen in biological systems. The soft lattice modules are comprised of 3D printed plastic "skeletons", linear contracting shape memory alloy spring actuators, and permanent magnets that enable adhesion between modules. The soft lattice modules are capable of independent locomotion, and can also join with other modules to achieve collective, self-assembled, larger scale tasks such as collective locomotion and moving an object across the surface of the lattice assembly. This work represents a preliminary step toward soft modular systems capable of independent and collective behaviors, and provide a platform for future studies on distributed control.
△ Less
Submitted 8 March, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
An Adaptive Self-Scheduling Loop Scheduler
Authors:
Joshua Dennis Booth,
Phillip Lane
Abstract:
Many shared-memory parallel irregular applications, such as sparse linear algebra and graph algorithms, depend on efficient loop scheduling (LS) in a fork-join manner despite that the work per loop iteration can greatly vary depending on the application and the input. Because of its importance, many different methods, e.g., workload-aware self-scheduling, and parameters, e.g., chunk size, have bee…
▽ More
Many shared-memory parallel irregular applications, such as sparse linear algebra and graph algorithms, depend on efficient loop scheduling (LS) in a fork-join manner despite that the work per loop iteration can greatly vary depending on the application and the input. Because of its importance, many different methods, e.g., workload-aware self-scheduling, and parameters, e.g., chunk size, have been explored to achieve reasonable performance that requires expert prior knowledge about the application and input. This work proposes a new LS method that requires little to no expert knowledge to achieve speedups close to those of tuned LS methods by self-managing chunk size based on a heuristic of workload variance and using work-stealing. This method, named \ichunk, is implemented into libgomp for testing. It is evaluated against OpenMP's guided, dynamic, and taskloop methods and is evaluated against BinLPT and generic work-stealing on an array of applications that includes: a synthetic benchmark, breadth-first search, K-Means, the molecular dynamics code LavaMD, and sparse matrix-vector multiplication. On 28 thread Intel system, \ichunk is the only method to always be one of the top three LS methods. On average across all applications, \ichunk is within 5.4% of the best method and is even able to outperform other LS methods for breadth-first search and K-Means.
△ Less
Submitted 28 October, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Realistic Physics Based Character Controller
Authors:
Joe Booth,
Vladimir Ivanov
Abstract:
Over the course of the last several years there was a strong interest in application of modern optimal control techniques to the field of character animation. This interest was fueled by introduction of efficient learning based algorithms for policy optimization, growth in computation power, and game engine improvements. It was shown that it is possible to generate natural looking control of a cha…
▽ More
Over the course of the last several years there was a strong interest in application of modern optimal control techniques to the field of character animation. This interest was fueled by introduction of efficient learning based algorithms for policy optimization, growth in computation power, and game engine improvements. It was shown that it is possible to generate natural looking control of a character by using two ingredients. First, the simulated agent must adhere to a motion capture dataset. And second, the character aims to track the control input from the user. The paper aims at closing the gap between the researchers and users by introducing an open source implementation of physics based character control in Unity framework that has a low entry barrier and a steep learning curve.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Practical sensorless aberration estimation for 3D microscopy with deep learning
Authors:
Debayan Saha,
Uwe Schmidt,
Qinrong Zhang,
Aurelien Barbotin,
Qi Hu,
Na Ji,
Martin J. Booth,
Martin Weigert,
Eugene W. Myers
Abstract:
Estimation of optical aberrations from volumetric intensity images is a key step in sensorless adaptive optics for 3D microscopy. Recent approaches based on deep learning promise accurate results at fast processing speeds. However, collecting ground truth microscopy data for training the network is typically very difficult or even impossible thereby limiting this approach in practice. Here, we dem…
▽ More
Estimation of optical aberrations from volumetric intensity images is a key step in sensorless adaptive optics for 3D microscopy. Recent approaches based on deep learning promise accurate results at fast processing speeds. However, collecting ground truth microscopy data for training the network is typically very difficult or even impossible thereby limiting this approach in practice. Here, we demonstrate that neural networks trained only on simulated data yield accurate predictions for real experimental images. We validate our approach on simulated and experimental datasets acquired with two different microscopy modalities, and also compare the results to non-learned methods. Additionally, we study the predictability of individual aberrations with respect to their data requirements and find that the symmetry of the wavefront plays a crucial role. Finally, we make our implementation freely available as open source software in Python.
△ Less
Submitted 5 July, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
ViWi Vision-Aided mmWave Beam Tracking: Dataset, Task, and Baseline Solutions
Authors:
Muhammad Alrabeiah,
Jayden Booth,
Andrew Hredzak,
Ahmed Alkhateeb
Abstract:
Vision-aided wireless communication is motivated by the recent advances in deep learning and computer vision as well as the increasing dependence on line-of-sight links in millimeter wave (mmWave) and terahertz systems. By leveraging vision, this new research direction enables an interesting set of new capabilities such as vision-aided mmWave beam and blockage prediction, proactive hand-off, and r…
▽ More
Vision-aided wireless communication is motivated by the recent advances in deep learning and computer vision as well as the increasing dependence on line-of-sight links in millimeter wave (mmWave) and terahertz systems. By leveraging vision, this new research direction enables an interesting set of new capabilities such as vision-aided mmWave beam and blockage prediction, proactive hand-off, and resource allocation among others. These capabilities have the potential of reliably supporting highly-mobile applications such as vehicular/drone communications and wireless virtual/augmented reality in mmWave and terahertz systems. Investigating these interesting applications, however, requires the development of special dataset and machine learning tasks. Based on the Vision-Wireless (ViWi) dataset generation framework [1], this paper develops an advanced and realistic scenario/dataset that features multiple base stations, mobile users, and rich dynamics. Enabled by this dataset, the paper defines the vision-wireless mmWave beam tracking task (ViWi-BT) and proposes a baseline solution that can provide an initial benchmark for the future ViWi-BT algorithms.
△ Less
Submitted 13 February, 2020; v1 submitted 6 February, 2020;
originally announced February 2020.
-
PPO Dash: Improving Generalization in Deep Reinforcement Learning
Authors:
Joe Booth
Abstract:
Deep reinforcement learning is prone to overfitting, and traditional benchmarks such as Atari 2600 benchmark can exacerbate this problem. The Obstacle Tower Challenge addresses this by using randomized environments and separate seeds for training, validation, and test runs. This paper examines various improvements and best practices to the PPO algorithm using the Obstacle Tower Challenge to empiri…
▽ More
Deep reinforcement learning is prone to overfitting, and traditional benchmarks such as Atari 2600 benchmark can exacerbate this problem. The Obstacle Tower Challenge addresses this by using randomized environments and separate seeds for training, validation, and test runs. This paper examines various improvements and best practices to the PPO algorithm using the Obstacle Tower Challenge to empirically study their impact with regards to generalization. Our experiments show that the combination provides state-of-the-art performance on the Obstacle Tower Challenge.
△ Less
Submitted 25 July, 2019; v1 submitted 15 July, 2019;
originally announced July 2019.
-
The gradient flow structures of thermo-poro-visco-elastic processes in porous media
Authors:
Jakub Wiktor Both,
Kundan Kumar,
Jan Martin Nordbotten,
Florin Adrian Radu
Abstract:
In this paper, the inherent gradient flow structures of thermo-poro-visco-elastic processes in porous media are examined for the first time. In the first part, a modelling framework is introduced aiming for describing such processes as generalized gradient flows requiring choices of physical states, corresponding energies, dissipation potentials and external work rates. It is demonstrated that var…
▽ More
In this paper, the inherent gradient flow structures of thermo-poro-visco-elastic processes in porous media are examined for the first time. In the first part, a modelling framework is introduced aiming for describing such processes as generalized gradient flows requiring choices of physical states, corresponding energies, dissipation potentials and external work rates. It is demonstrated that various existing models can be in fact written within this framework. Ultimately, the particular structure allows for a unified well-posedness analysis performed for different classes of linear and non-linear models. In the second part, the gradient flow structures are utilized for constructing efficient discrete approximation schemes for thermo-poro-visco-elasticity -- in particular robust, physical splitting schemes. Applying alternating minimization to naturally arising minimization formulations of (semi-)discrete models is proposed. For such, the energy decrease per iteration is quantified by applying abstract convergence theory only utilizing convexity and Lipschitz continuity properties of the problem -- a fairly simple but flexible machinery. By this approach, e.g., the widely used undrained and fixed-stress splits for the linear Biot equations are derived and analyzed. By application of the framework to more advanced models, novel splitting schemes with guaranteed theoretical convergence rates are naturally derived. Moreover, based on the minimization character of the (semi-)discrete equations, relaxation of splitting schemes by line search is proposed; numerical results show a potentially great impact on the acceleration of splitting schemes for both linear and nonlinear problems.
△ Less
Submitted 25 November, 2019; v1 submitted 6 July, 2019;
originally announced July 2019.
-
Marathon Environments: Multi-Agent Continuous Control Benchmarks in a Modern Video Game Engine
Authors:
Joe Booth,
Jackson Booth
Abstract:
Recent advances in deep reinforcement learning in the paradigm of locomotion using continuous control have raised the interest of game makers for the potential of digital actors using active ragdoll. Currently, the available options to develop these ideas are either researchers' limited codebase or proprietary closed systems. We present Marathon Environments, a suite of open source, continuous con…
▽ More
Recent advances in deep reinforcement learning in the paradigm of locomotion using continuous control have raised the interest of game makers for the potential of digital actors using active ragdoll. Currently, the available options to develop these ideas are either researchers' limited codebase or proprietary closed systems. We present Marathon Environments, a suite of open source, continuous control benchmarks implemented on the Unity game engine, using the Unity ML- Agents Toolkit. We demonstrate through these benchmarks that continuous control research is transferable to a commercial game engine. Furthermore, we exhibit the robustness of these environments by reproducing advanced continuous control research, such as learning to walk, run and backflip from motion capture data; learning to navigate complex terrains; and by implementing a video game input control system. We show further robustness by training with alternative algorithms found in OpenAI.Baselines. Finally, we share strategies for significantly reducing the training time.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Javelin: A Scalable Implementation for Sparse Incomplete LU Factorization
Authors:
Joshua Dennis Booth,
Gregory Bolet
Abstract:
In this work, we present a new scalable incomplete LU factorization framework called Javelin to be used as a preconditioner for solving sparse linear systems with iterative methods. Javelin allows for improved parallel factorization on shared-memory many-core systems by packaging the coefficient matrix into a format that allows for high performance sparse matrix-vector multiplication and sparse tr…
▽ More
In this work, we present a new scalable incomplete LU factorization framework called Javelin to be used as a preconditioner for solving sparse linear systems with iterative methods. Javelin allows for improved parallel factorization on shared-memory many-core systems by packaging the coefficient matrix into a format that allows for high performance sparse matrix-vector multiplication and sparse triangular solves with minimal overheads. The framework achieves these goals by using a collection of traditional permutations, point-to-point thread synchronizations, tasking, and segmented prefix scans in a conventional compressed sparse row format. Moreover, this framework stresses the importance of co-designing dependent tasks, such as sparse factorization and triangular solves, on highly-threaded architectures. Using these changes, traditional fill-in and drop tolerance methods can be used, while still being able to have observed speedups of up to ~42x on 68 Intel Knights Landing cores and ~12x on 14 Intel Haswell cores.
△ Less
Submitted 2 May, 2019; v1 submitted 13 December, 2018;
originally announced December 2018.
-
3D Face Morphable Models "In-the-Wild"
Authors:
James Booth,
Epameinondas Antonakos,
Stylianos Ploumpis,
George Trigeorgis,
Yannis Panagakis,
Stefanos Zafeiriou
Abstract:
3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though pow…
▽ More
3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions ("in-the-wild"). In this paper, we propose the first, to the best of our knowledge, "in-the-wild" 3DMM by combining a powerful statistical model of facial shape, which describes both identity and expression, with an "in-the-wild" texture model. We show that the employment of such an "in-the-wild" texture model greatly simplifies the fitting procedure, because there is no need to optimize with regards to the illumination parameters. Furthermore, we propose a new fast algorithm for fitting the 3DMM in arbitrary images. Finally, we have captured the first 3D facial database with relatively unconstrained conditions and report quantitative evaluations with state-of-the-art performance. Complementary qualitative reconstruction results are demonstrated on standard "in-the-wild" facial databases. An open source implementation of our technique is released as part of the Menpo Project.
△ Less
Submitted 19 January, 2017;
originally announced January 2017.
-
Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts
Authors:
Joshua Dennis Booth,
Sivasankaran Rajamanickam,
Heidi K. Thornquist
Abstract:
Scalable sparse LU factorization is critical for efficient numerical simulation of circuits and electrical power grids. In this work, we present a new scalable sparse direct solver called Basker. Basker introduces a new algorithm to parallelize the Gilbert-Peierls algorithm for sparse LU factorization. As architectures evolve, there exists a need for algorithms that are hierarchical in nature to m…
▽ More
Scalable sparse LU factorization is critical for efficient numerical simulation of circuits and electrical power grids. In this work, we present a new scalable sparse direct solver called Basker. Basker introduces a new algorithm to parallelize the Gilbert-Peierls algorithm for sparse LU factorization. As architectures evolve, there exists a need for algorithms that are hierarchical in nature to match the hierarchy in thread teams, individual threads, and vector level parallelism. Basker is designed to map well to this hierarchy in architectures. There is also a need for data layouts to match multiple levels of hierarchy in memory. Basker uses a two-dimensional hierarchical structure of sparse matrices that maps to the hierarchy in the memory architectures and to the hierarchy in parallelism. We present performance evaluations of Basker on the Intel SandyBridge and Xeon Phi platforms using circuit and power grid matrices taken from the University of Florida sparse matrix collection and from Xyce circuit simulations. Basker achieves a geometric mean speedup of 5.91x on CPU (16 cores) and 7.4x on Xeon Phi (32 cores) relative to KLU. Basker outperforms Intel MKL Pardiso (PMKL) by as much as 53x on CPU (16 cores) and 13.3x on Xeon Phi (32 cores) for low fill-in circuit matrices. Furthermore, Basker provides 5.4x speedup on a challenging matrix sequence taken from an actual Xyce simulation.
△ Less
Submitted 21 January, 2016;
originally announced January 2016.