-
Gauss-Newton Natural Gradient Descent for Physics-Informed Computational Fluid Dynamics
Authors:
Anas Jnini,
Flavio Vella,
Marius Zeinhofer
Abstract:
We propose Gauss-Newton's method in function space for the solution of the Navier-Stokes equations in the physics-informed neural network (PINN) framework. Upon discretization, this yields a natural gradient method that provably mimics the function space dynamics. Our computational results demonstrate close to single-precision accuracy measured in relative $L^2$ norm on a number of benchmark probl…
▽ More
We propose Gauss-Newton's method in function space for the solution of the Navier-Stokes equations in the physics-informed neural network (PINN) framework. Upon discretization, this yields a natural gradient method that provably mimics the function space dynamics. Our computational results demonstrate close to single-precision accuracy measured in relative $L^2$ norm on a number of benchmark problems. To the best of our knowledge, this constitutes the first contribution in the PINN literature that solves the Navier-Stokes equations to this degree of accuracy. Finally, we show that given a suitable integral discretization, the proposed optimization algorithm agrees with Gauss-Newton's method in parameter space. This allows a matrix-free formulation enabling efficient scalability to large network sizes.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
State of practice: evaluating GPU performance of state vector and tensor network methods
Authors:
Marzio Vallero,
Flavio Vella,
Paolo Rech
Abstract:
The frontier of quantum computing (QC) simulation on classical hardware is quickly reaching the hard scalability limits for computational feasibility. Nonetheless, there is still a need to simulate large quantum systems classically, as the Noisy Intermediate Scale Quantum (NISQ) devices are yet to be considered fault tolerant and performant enough in terms of operations per second. Each of the two…
▽ More
The frontier of quantum computing (QC) simulation on classical hardware is quickly reaching the hard scalability limits for computational feasibility. Nonetheless, there is still a need to simulate large quantum systems classically, as the Noisy Intermediate Scale Quantum (NISQ) devices are yet to be considered fault tolerant and performant enough in terms of operations per second. Each of the two main exact simulation techniques, state vector and tensor network simulators, boasts specific limitations. The exponential memory requirement of state vector simulation, when compared to the qubit register sizes of currently available quantum computers, quickly saturates the capacity of the top HPC machines currently available. Tensor network contraction approaches, which encode quantum circuits into tensor networks and then contract them over an output bit string to obtain its probability amplitude, still fall short of the inherent complexity of finding an optimal contraction path, which maps to a max-cut problem on a dense mesh, a notably NP-hard problem.
This article aims at investigating the limits of current state-of-the-art simulation techniques on a test bench made of eight widely used quantum subroutines, each in 31 different configurations, with special emphasis on performance. We then correlate the performance measures of the simulators with the metrics that characterise the benchmark circuits, identifying the main reasons behind the observed performance trend. From our observations, given the structure of a quantum circuit and the number of qubits, we highlight how to select the best simulation strategy, obtaining a speedup of up to an order of magnitude.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
HandiMathKey-Device
Authors:
Frédéric Vella,
Nathalie Dubus,
Eloise Grolleau,
Marjorie Deleau,
Cécile Malet,
Christine Gallard,
Véronique Ades,
Nadine Vigouroux
Abstract:
Ty** mathematics is sometimes difficult with text editor functions for students with motor impairment and other associated impairments (visual, cognitive). Based on the HandiMathKey software keyboard, a user-centred design method involving the ecosytem of disabled students was applied to design the HMK-D physical keyboard for mathematical input. We opted for the Stream Deck device because of its…
▽ More
Ty** mathematics is sometimes difficult with text editor functions for students with motor impairment and other associated impairments (visual, cognitive). Based on the HandiMathKey software keyboard, a user-centred design method involving the ecosytem of disabled students was applied to design the HMK-D physical keyboard for mathematical input. We opted for the Stream Deck device because of its multimedia features and its appeal to young students to the HMK-D. Preliminary tests with 8 students (5 in secondary school and 3 in high school) shows that HMK-D is highly accepted, accessible and fun for mathematical input by students with impairments. A longitudinal study of the usability and acceptability of HMK-D is planned for the 2023-2024 school year.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
A first step towards an ecosystem meta-model for humancentered design in case of disabled users
Authors:
Christophe Kolski,
Nadine Vigouroux,
Yohan Guerrier,
Frédéric Vella,
Marine Guffroy
Abstract:
The involvement of the ecosystem or social environment of the disabled user is considered as very useful and even essential for the human-centered design of assistive technologies. In the era of model-based approaches, the modeling of the ecosystem is therefore to be considered. The first version of a metamodel of ecosystem is proposed. It is illustrated through a first case study. It concerns a p…
▽ More
The involvement of the ecosystem or social environment of the disabled user is considered as very useful and even essential for the human-centered design of assistive technologies. In the era of model-based approaches, the modeling of the ecosystem is therefore to be considered. The first version of a metamodel of ecosystem is proposed. It is illustrated through a first case study. It concerns a project aiming at a communication aid for people with cerebral palsy. A conclusion and research perspectives end this paper.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Design Recommendations Based on Speech Analysis for Disability-Friendly Interfaces for the Control of a Home Automation Environment
Authors:
Nadine Vigouroux,
Frédéric Vella,
Gaëlle Lepage,
Éric Campo
Abstract:
The objective of this paper is to describe the study on speech interaction mode for home automation control of equipment by impaired people for an inclusive housing. The study is related to the HIP HOPE project concerning a building of 19 inclusive housing units. 7 participants with different types of disabilities were invited to carry out use cases using voice and touch control. Only the results…
▽ More
The objective of this paper is to describe the study on speech interaction mode for home automation control of equipment by impaired people for an inclusive housing. The study is related to the HIP HOPE project concerning a building of 19 inclusive housing units. 7 participants with different types of disabilities were invited to carry out use cases using voice and touch control. Only the results obtained on the voice interaction mode through the Amazon voice assistant are reported here. The results show, according to the type of handicap, the success rates in the speech recognition of the command emitted on the equipment and highlight the errors related to the formulation, the noisy environment, the intelligible speech, the speech segmentation and the bad synchronization of the audio channel opening.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Marital Sorting, Household Inequality and Selection
Authors:
Iván Fernández-Val,
Aico van Vuuren,
Francis Vella
Abstract:
Using CPS data for 1976 to 2022 we explore how wage inequality has evolved for married couples with both spouses working full time full year, and its impact on household income inequality. We also investigate how marriage sorting patterns have changed over this period. To determine the factors driving income inequality we estimate a model explaining the joint distribution of wages which accounts f…
▽ More
Using CPS data for 1976 to 2022 we explore how wage inequality has evolved for married couples with both spouses working full time full year, and its impact on household income inequality. We also investigate how marriage sorting patterns have changed over this period. To determine the factors driving income inequality we estimate a model explaining the joint distribution of wages which accounts for the spouses' employment decisions. We find that income inequality has increased for these households and increased assortative matching of wages has exacerbated the inequality resulting from individual wage growth. We find that positive sorting partially reflects the correlation across unobservables influencing both members' of the marriage wages. We decompose the changes in sorting patterns over the 47 years comprising our sample into structural, composition and selection effects and find that the increase in positive sorting primarily reflects the increased skill premia for both observed and unobserved characteristics.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Scaling Expected Force: Efficient Identification of Key Nodes in Network-based Epidemic Models
Authors:
Paolo Sylos Labini,
Andrej Jurco,
Matteo Ceccarello,
Stefano Guarino,
Enrico Mastrostefano,
Flavio Vella
Abstract:
Centrality measures are fundamental tools of network analysis as they highlight the key actors within the network. This study focuses on a newly proposed centrality measure, Expected Force (EF), and its use in identifying spreaders in network-based epidemic models. We found that EF effectively predicts the spreading power of nodes and identifies key nodes and immunization targets. However, its hig…
▽ More
Centrality measures are fundamental tools of network analysis as they highlight the key actors within the network. This study focuses on a newly proposed centrality measure, Expected Force (EF), and its use in identifying spreaders in network-based epidemic models. We found that EF effectively predicts the spreading power of nodes and identifies key nodes and immunization targets. However, its high computational cost presents a challenge for its use in large networks. To overcome this limitation, we propose two parallel scalable algorithms for computing EF scores: the first algorithm is based on the original formulation, while the second one focuses on a cluster-centric approach to improve efficiency and scalability. Our implementations significantly reduce computation time, allowing for the detection of key nodes at large scales. Performance analysis on synthetic and real-world networks demonstrates that the GPU implementation of our algorithm can efficiently scale to networks with up to 44 million edges by exploiting modern parallel architectures, achieving speed-ups of up to 300x, and 50x on average, compared to the simple parallel solution.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Multi-GPU aggregation-based AMG preconditioner for iterative linear solvers
Authors:
Massimo Bernaschi,
Alessandro Celestini,
Pasqua D'Ambra,
Flavio Vella
Abstract:
We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting NVIDIA Graphics Processing Unit (GPU) accelerators.
The work extends our previous efforts in th…
▽ More
We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting NVIDIA Graphics Processing Unit (GPU) accelerators.
The work extends our previous efforts in the exploitation of a single GPU accelerator and proposes an implementation, based on the hybrid MPI-CUDA software environment, of a Krylov-type linear solver relying on an efficient Algebraic MultiGrid (AMG) preconditioner already available in the BootCMatchG library. Our design for the hybrid implementation has been driven by the best practices for minimizing data communication overhead when multiple GPUs are employed, yet preserving the efficiency of the single GPU kernels. Strong and weak scalability results on well-known benchmark test cases of the new version of the library are discussed. Comparisons with the Nvidia AmgX solution show an improvement of up to 2.0x in the solve phase.
△ Less
Submitted 4 March, 2023;
originally announced March 2023.
-
Towards a learning-based performance modeling for accelerating Deep Neural Networks
Authors:
Damiano Perri,
Paolo Sylos Labini,
Osvaldo Gervasi,
Sergio Tasso,
Flavio Vella
Abstract:
Emerging applications such as Deep Learning are often data-driven, thus traditional approaches based on auto-tuners are not performance effective across the wide range of inputs used in practice. In the present paper, we start an investigation of predictive models based on machine learning techniques in order to optimize Convolution Neural Networks (CNNs). As a use-case, we focus on the ARM Comput…
▽ More
Emerging applications such as Deep Learning are often data-driven, thus traditional approaches based on auto-tuners are not performance effective across the wide range of inputs used in practice. In the present paper, we start an investigation of predictive models based on machine learning techniques in order to optimize Convolution Neural Networks (CNNs). As a use-case, we focus on the ARM Compute Library which provides three different implementations of the convolution operator at different numeric precision. Starting from a collation of benchmarks, we build and validate models learned by Decision Tree and naive Bayesian classifier. Preliminary experiments on Midgard-based ARM Mali GPU show that our predictive model outperforms all the convolution operators manually selected by the library.
△ Less
Submitted 9 December, 2022;
originally announced December 2022.
-
User Centred Method to Design a Platform to Design Augmentative and Alternative Communication Assistive Technologies
Authors:
Frédéric Vella,
Flavien Clastres-Babou,
Nadine Vigouroux,
Philippe Truillet,
Charline Calmels,
Caroline Mercadier,
Karine Gigaud,
Margot Issanchou,
Kristina Gourinovitch,
Anne Garaix
Abstract:
We describe a co-design approach to design the online WebSoKeyTo used to design AAC. This co-design was carried out between a team of therapists and a team of human-computer interaction researchers. Our approach begins with the use and evaluation of an existing SoKeyTo AAC design application. This step was essential in the awareness and definition of the needs by the therapists and in the understa…
▽ More
We describe a co-design approach to design the online WebSoKeyTo used to design AAC. This co-design was carried out between a team of therapists and a team of human-computer interaction researchers. Our approach begins with the use and evaluation of an existing SoKeyTo AAC design application. This step was essential in the awareness and definition of the needs by the therapists and in the understanding of the poor usability scores of SoKeyTo by the researchers. We then describe the various phases (focus group, brainstorming, prototy**) with the co-design choices retained. An evaluation of WebSoKeyTo is in progress.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Participation of Stakeholder in the Design of a Conception Application of Augmentative and Alternative Communication
Authors:
Frédéric Vella,
Flavien Clastres-Babou,
Frédéric Vella,
Nadine Vigouroux,
Philippe Truillet,
Nadine Vigouroux,
Charline Calmels,
Caroline Mercadier,
Karine Gigaud,
Margot Issanchou,
Kristina Gourinovitch,
Anne Garaix
Abstract:
The objective of this paper is to describe the implication of an interdisciplinary team involved during a user-centered design methodology to design the platform (WebSoKeyTo) that meets the needs of therapists to design augmentative and alternative communication (AAC) aids for disabled users. We describe the processes of the design process and the role of the various actors (therapists and human c…
▽ More
The objective of this paper is to describe the implication of an interdisciplinary team involved during a user-centered design methodology to design the platform (WebSoKeyTo) that meets the needs of therapists to design augmentative and alternative communication (AAC) aids for disabled users. We describe the processes of the design process and the role of the various actors (therapists and human computer researchers) in the various phases of the process. Finally, we analyze a satisfaction scale of the therapists on their participation in the codesign process. This study demonstrates the interest in extending the design actors to other therapists and caregivers (professional and family) in the daily life of people with disabilities.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
IDEALI: intuitively localising connected devices in order to support autonomy
Authors:
Frédéric Vella,
Réjane Dalcé,
Antonio Serpa,
Thierry Val,
Adrien van Den Bossche,
Frédéric Vella,
Nadine Vigouroux
Abstract:
The ability to localise a smart device is very useful to visually or cognitively impaired people. Localisation-capable technologies are becoming more readily available as off-the-shelf components. In this paper, we highlight the need for such a service in the field of health and autonomy, especially for disabled people. We introduce a model for Semantic Position Description (SPD) (e.g. "The pill o…
▽ More
The ability to localise a smart device is very useful to visually or cognitively impaired people. Localisation-capable technologies are becoming more readily available as off-the-shelf components. In this paper, we highlight the need for such a service in the field of health and autonomy, especially for disabled people. We introduce a model for Semantic Position Description (SPD) (e.g. "The pill organiser in on the kitchen table") as well as various algorithms that transform raw distance estimations to SPD related to proximity, alignment and room identification. Two of these algorithms are evaluated using the LocURa4IoT testbed. The results are compared to the output of a pre-experiment involving ten human participants in the Maison Intelligente de Blagnac. The two studies indicate that both approaches converge up to 90% of the time. .
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Usability Study of Tactile and Voice Interaction Modes by People with Disabilities for Home Automation Controls
Authors:
Nadine Vigouroux,
Frédéric Vella,
Gaëlle Lepage,
Eric Campo
Abstract:
This paper presents a comparative usability study on tactile and vocal interaction modes for home automation control of equipment at home for different profiles of disabled people. The study is related to the HIP HOPE project concerning the construction of 19 inclusive housing in the Toulouse metropolitan area in France. The experimentation took place in a living lab with 7 different disabled peop…
▽ More
This paper presents a comparative usability study on tactile and vocal interaction modes for home automation control of equipment at home for different profiles of disabled people. The study is related to the HIP HOPE project concerning the construction of 19 inclusive housing in the Toulouse metropolitan area in France. The experimentation took place in a living lab with 7 different disabled people who realize realistic use cases. The USE and UEQ questionnaires were selected as usability tools. The first results show that both interfaces are easy to learn but that usefulness and ease of use dimensions need to be improved. This study shows that there is real need for multimodality between touch and voice interaction to control the smart home. This study also shows that there is need to adapt the interface and the environment to the person's disability.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations
Authors:
Maciej Besta,
Cesare Miglioli,
Paolo Sylos Labini,
Jakub Tětek,
Patrick Iff,
Raghavendra Kanakagiri,
Saleh Ashkboos,
Kacper Janda,
Michal Podstawski,
Grzegorz Kwasniewski,
Niels Gleinig,
Flavio Vella,
Onur Mutlu,
Torsten Hoefler
Abstract:
Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as…
▽ More
Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as Bloom filters. These representations are much faster to process than the original vertex sets thanks to vectorizability and small size. We use these representations as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. When enhanced with ProbGraph, these algorithms significantly outperform tuned parallel exact baselines (up to nearly 50x on 32 cores) while ensuring accuracy of more than 90% for many input graph datasets. Our novel bounds and algorithms based on probabilistic set representations with desirable statistical properties are of separate interest for the data analytics community.
△ Less
Submitted 21 November, 2022; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching
Authors:
András Strausz,
Flavio Vella,
Salvatore Di Girolamo,
Maciej Besta,
Torsten Hoefler
Abstract:
Triangle count and local clustering coefficient are two core metrics for graph analysis. They find broad application in analyses such as community detection and link recommendation. Current state-of-the-art solutions suffer from synchronization overheads or expensive pre-computations needed to distribute the graph, achieving limited scaling capabilities. We propose a fully asynchronous implementat…
▽ More
Triangle count and local clustering coefficient are two core metrics for graph analysis. They find broad application in analyses such as community detection and link recommendation. Current state-of-the-art solutions suffer from synchronization overheads or expensive pre-computations needed to distribute the graph, achieving limited scaling capabilities. We propose a fully asynchronous implementation for triangle counting and local clustering coefficient based on 1D partitioning, using remote memory accesses for transferring data and avoid synchronization. Additionally, we show how these algorithms present data reuse on remote memory accesses and how the overall communication time can be improved by caching these accesses. Finally, we extend CLaMPI, a software-layer caching system for MPI RMA, to include application-specific scores for cached entries and influence the eviction procedure to improve caching efficiency. Our results show improvements on shared memory, and we achieve 14x speedup from 4 to 64 nodes for the LiveJournal1 graph on distributed memory. Moreover, we demonstrate how caching remote accesses reduces total running time by up to 73% with respect to a non-cached version. Finally, we compare our implementation to TriC, the 2020 graph champion paper, and achieve up to 100x faster results for scale-free graphs.
△ Less
Submitted 1 March, 2022; v1 submitted 28 February, 2022;
originally announced February 2022.
-
Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators
Authors:
Paolo Sylos Labini,
Massimo Bernaschi,
Francesco Silvestri,
Flavio Vella
Abstract:
Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since their features are specifically designed for tensor algebra (typically dense matrix-product), it is commonly assumed that they are not suitable for applications wi…
▽ More
Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since their features are specifically designed for tensor algebra (typically dense matrix-product), it is commonly assumed that they are not suitable for applications with sparse data. To challenge this viewpoint, we discuss methods and present solutions for accelerating sparse matrix multiplication on such architectures. In particular, we present a 1-dimensional blocking algorithm with theoretical guarantees on the density, which builds dense blocks from arbitrary sparse matrices. Experimental results show that, even for unstructured and highly-sparse matrices, our block-based solution which exploits Nvidia Tensor Cores is faster than its sparse counterpart. We observed significant speed-ups of up to two orders of magnitude on real-world sparse matrices.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
Dynamic Heterogeneous Distribution Regression Panel Models, with an Application to Labor Income Processes
Authors:
Ivan Fernandez-Val,
Wayne Yuan Gao,
Yuan Liao,
Francis Vella
Abstract:
We consider the estimation of a dynamic distribution regression panel data model with heterogeneous coefficients across units. The objects of primary interest are specific functionals of these coefficients. These include predicted actual and stationary distributions of the outcome variable and quantile treatment effects. Coefficients and their functionals are estimated via fixed effect methods. We…
▽ More
We consider the estimation of a dynamic distribution regression panel data model with heterogeneous coefficients across units. The objects of primary interest are specific functionals of these coefficients. These include predicted actual and stationary distributions of the outcome variable and quantile treatment effects. Coefficients and their functionals are estimated via fixed effect methods. We investigate how these functionals vary in response to changes in initial conditions or covariate values. We also identify a uniformity issue related to the robustness of inference to the unknown degree of heterogeneity, and propose a cross-sectional bootstrap method for uniformly valid inference on function-valued objects. Employing PSID annual labor income data we illustrate some important empirical issues we can address. We first quantify the impact of a negative labor income shock on the distribution of future labor income. We also examine the impact on the distribution of labor income from increasing the education level of a chosen group of workers. Finally, we demonstrate the existence of heterogeneity in income mobility, and how this leads to substantial variation in individuals' incidences to be trapped in poverty. We also provide simulation evidence confirming that our procedures work well.
△ Less
Submitted 14 January, 2023; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Hours Worked and the U.S. Distribution of Real Annual Earnings 1976-2019
Authors:
Iván Fernández-Val,
Franco Peracchi,
Aico van Vuuren,
Francis Vella
Abstract:
We examine the impact of annual hours worked on annual earnings by decomposing changes in the real annual earnings distribution into composition, structural and hours effects. We do so via a nonseparable simultaneous model of hours, wages and earnings. Using the Current Population Survey for the survey years 1976--2019, we find that changes in the female distribution of annual hours of work are im…
▽ More
We examine the impact of annual hours worked on annual earnings by decomposing changes in the real annual earnings distribution into composition, structural and hours effects. We do so via a nonseparable simultaneous model of hours, wages and earnings. Using the Current Population Survey for the survey years 1976--2019, we find that changes in the female distribution of annual hours of work are important in explaining movements in inequality in female annual earnings. This captures the substantial changes in their employment behavior over this period. Movements in the male hours distribution only affect the lower part of their earnings distribution and reflect the sensitivity of these workers' annual hours of work to cyclical factors.
△ Less
Submitted 18 November, 2021; v1 submitted 25 February, 2020;
originally announced February 2020.
-
GPU-based parallelism for ASP-solving
Authors:
Agostino Dovier,
Andrea Formisano,
Flavio Vella
Abstract:
Answer Set Programming (ASP) has become, the paradigm of choice in the field of logic programming and non-monotonic reasoning. Thanks to the availability of efficient solvers, ASP has been successfully employed in a large number of application domains. The term GPU-computing indicates a recent programming paradigm aimed at enabling the use of modern parallel Graphical Processing Units (GPUs) for g…
▽ More
Answer Set Programming (ASP) has become, the paradigm of choice in the field of logic programming and non-monotonic reasoning. Thanks to the availability of efficient solvers, ASP has been successfully employed in a large number of application domains. The term GPU-computing indicates a recent programming paradigm aimed at enabling the use of modern parallel Graphical Processing Units (GPUs) for general purpose computing. In this paper we describe an approach to ASP-solving that exploits GPU parallelism. The design of a GPU-based solver poses various challenges due to the peculiarities of GPUs' software and hardware architectures and to the intrinsic nature of the satisfiability problem.
△ Less
Submitted 4 September, 2019;
originally announced September 2019.
-
A Computational Model for Tensor Core Units
Authors:
Rezaul Chowdhury,
Francesco Silvestri,
Flavio Vella
Abstract:
To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature of these architectures is a hardware circuit for efficiently computing a dense matrix multiplication of a given small size. In order to broaden the class of alg…
▽ More
To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature of these architectures is a hardware circuit for efficiently computing a dense matrix multiplication of a given small size. In order to broaden the class of algorithms that exploit these systems, we propose a computational model, named the TCU model, that captures the ability to natively multiply small matrices. We then use the TCU model for designing fast algorithms for several problems, including matrix operations (dense and sparse multiplication, Gaussian Elimination), graph algorithms (transitive closure, all pairs shortest distances), Discrete Fourier Transform, stencil computations, integer multiplication, and polynomial evaluation. We finally highlight a relation between the TCU model and the external memory model.
△ Less
Submitted 9 July, 2020; v1 submitted 19 August, 2019;
originally announced August 2019.
-
Selection and the Distribution of Female Hourly Wages in the U.S
Authors:
Iván Fernández-Val,
Franco Peracchi,
Aico van Vuuren,
Francis Vella
Abstract:
We analyze the role of selection bias in generating the changes in the observed distribution of female hourly wages in the United States using CPS data for the years 1975 to 2020. We account for the selection bias from the employment decision by modeling the distribution of the number of working hours and estimating a nonseparable model of wages. We decompose changes in the wage distribution into…
▽ More
We analyze the role of selection bias in generating the changes in the observed distribution of female hourly wages in the United States using CPS data for the years 1975 to 2020. We account for the selection bias from the employment decision by modeling the distribution of the number of working hours and estimating a nonseparable model of wages. We decompose changes in the wage distribution into composition, structural and selection effects. Composition effects have increased wages at all quantiles while the impact of the structural effects varies by time period and quantile. Changes in the role of selection only appear at the lower quantiles of the wage distribution. The evidence suggests that there is positive selection in the 1970s which diminishes until the later 1990s. This reduces wages at lower quantiles and increases wage inequality. Post 2000 there appears to be an increase in positive sorting which reduces the selection effects on wage inequality.
△ Less
Submitted 27 January, 2022; v1 submitted 21 December, 2018;
originally announced January 2019.
-
A model-driven approach for a new generation of adaptive libraries
Authors:
Marco Cianfriglia,
Flavio Vella,
Cedric Nugteren,
Anton Lokhmotov,
Grigori Fursin
Abstract:
Efficient high-performance libraries often expose multiple tunable parameters to provide highly optimized routines. These can range from simple loop unroll factors or vector sizes all the way to algorithmic changes, given that some implementations can be more suitable for certain devices by exploiting hardware characteristics such as local memories and vector units. Traditionally, such parameters…
▽ More
Efficient high-performance libraries often expose multiple tunable parameters to provide highly optimized routines. These can range from simple loop unroll factors or vector sizes all the way to algorithmic changes, given that some implementations can be more suitable for certain devices by exploiting hardware characteristics such as local memories and vector units. Traditionally, such parameters and algorithmic choices are tuned and then hard-coded for a specific architecture and for certain characteristics of the inputs. However, emerging applications are often data-driven, thus traditional approaches are not effective across the wide range of inputs and architectures used in practice. In this paper, we present a new adaptive framework for data-driven applications which uses a predictive model to select the optimal algorithmic parameters by training with synthetic and real datasets. We demonstrate the effectiveness of a BLAS library and specifically on its matrix multiplication routine. We present experimental results for two GPU architectures and show significant performance gains of up to 3x (on a high-end NVIDIA Pascal GPU) and 2.5x (on an embedded ARM Mali GPU) when compared to a traditionally optimized library.
△ Less
Submitted 19 June, 2018;
originally announced June 2018.
-
Nonseparable Sample Selection Models with Censored Selection Rules
Authors:
Iván Fernández-Val,
Aico van Vuuren,
Francis Vella
Abstract:
We consider identification and estimation of nonseparable sample selection models with censored selection rules. We employ a control function approach and discuss different objects of interest based on (1) local effects conditional on the control function, and (2) global effects obtained from integration over ranges of values of the control function. We derive the conditions for the identification…
▽ More
We consider identification and estimation of nonseparable sample selection models with censored selection rules. We employ a control function approach and discuss different objects of interest based on (1) local effects conditional on the control function, and (2) global effects obtained from integration over ranges of values of the control function. We derive the conditions for the identification of these different objects and suggest strategies for estimation. Moreover, we provide the associated asymptotic theory. These strategies are illustrated in an empirical investigation of the determinants of female wages in the United Kingdom.
△ Less
Submitted 29 September, 2020; v1 submitted 26 January, 2018;
originally announced January 2018.
-
Semiparametric Estimation of Structural Functions in Nonseparable Triangular Models
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Whitney Newey,
Sami Stouli,
Francis Vella
Abstract:
Triangular systems with nonadditively separable unobserved heterogeneity provide a theoretically appealing framework for the modelling of complex structural relationships. However, they are not commonly used in practice due to the need for exogenous variables with large support for identification, the curse of dimensionality in estimation, and the lack of inferential tools. This paper introduces t…
▽ More
Triangular systems with nonadditively separable unobserved heterogeneity provide a theoretically appealing framework for the modelling of complex structural relationships. However, they are not commonly used in practice due to the need for exogenous variables with large support for identification, the curse of dimensionality in estimation, and the lack of inferential tools. This paper introduces two classes of semiparametric nonseparable triangular models that address these limitations. They are based on distribution and quantile regression modelling of the reduced form conditional distributions of the endogenous variables. We show that average, distribution and quantile structural functions are identified in these systems through a control function approach that does not require a large support condition. We propose a computationally attractive three-stage procedure to estimate the structural functions where the first two stages consist of quantile or distribution regressions. We provide asymptotic theory and uniform inference methods for each stage. In particular, we derive functional central limit theorems and bootstrap functional central limit theorems for the distribution regression estimators of the structural functions. These results establish the validity of the bootstrap for three-stage estimators of structural functions, and lead to simple inference algorithms. We illustrate the implementation and applicability of all our methods with numerical simulations and an empirical application to demand analysis.
△ Less
Submitted 5 October, 2019; v1 submitted 6 November, 2017;
originally announced November 2017.
-
Accelerating Energy Games Solvers on Modern Architectures
Authors:
Andrea Formisano,
Raffaella Gentilini,
Flavio Vella
Abstract:
Quantitative games, where quantitative objectives are defined on weighted game arenas, provide natural tools for designing faithful models of embedded controllers. Instances of these games that recently gained interest are the so called Energy Games. The fast-known algorithm solves Energy Games in O(EVW) where W is the maximum weight. Starting from a sequential baseline implementation, we investig…
▽ More
Quantitative games, where quantitative objectives are defined on weighted game arenas, provide natural tools for designing faithful models of embedded controllers. Instances of these games that recently gained interest are the so called Energy Games. The fast-known algorithm solves Energy Games in O(EVW) where W is the maximum weight. Starting from a sequential baseline implementation, we investigate the use of massively data computation capabilities supported by modern Graphics Processing Units to solve the `initial credit problem' for Energy Games. We present four different parallel implementations on multi-core CPU and GPU systems. Our solution outperforms the baseline implementation by up to 36x speedup and obtains a faster convergence time on real-world graphs.
△ Less
Submitted 10 October, 2017;
originally announced October 2017.
-
Creative Robot Dance with Variational Encoder
Authors:
Agnese Augello,
Emanuele Cipolla,
Ignazio Infantino,
Adriano Manfre,
Giovanni Pilato,
Filippo Vella
Abstract:
What we appreciate in dance is the ability of people to sponta- neously improvise new movements and choreographies, sur- rendering to the music rhythm, being inspired by the cur- rent perceptions and sensations and by previous experiences, deeply stored in their memory. Like other human abilities, this, of course, is challenging to reproduce in an artificial entity such as a robot. Recent generati…
▽ More
What we appreciate in dance is the ability of people to sponta- neously improvise new movements and choreographies, sur- rendering to the music rhythm, being inspired by the cur- rent perceptions and sensations and by previous experiences, deeply stored in their memory. Like other human abilities, this, of course, is challenging to reproduce in an artificial entity such as a robot. Recent generations of anthropomor- phic robots, the so-called humanoids, however, exhibit more and more sophisticated skills and raised the interest in robotic communities to design and experiment systems devoted to automatic dance generation. In this work, we highlight the importance to model a computational creativity behavior in dancing robots to avoid a mere execution of preprogrammed dances. In particular, we exploit a deep learning approach that allows a robot to generate in real time new dancing move- ments according to to the listened music.
△ Less
Submitted 5 July, 2017;
originally announced July 2017.
-
Scaling betweenness centrality using communication-efficient sparse matrix multiplication
Authors:
Edgar Solomonik,
Maciej Besta,
Flavio Vella,
Torsten Hoefler
Abstract:
Betweenness centrality (BC) is a crucial graph problem that measures the significance of a vertex by the number of shortest paths leading through it. We propose Maximal Frontier Betweenness Centrality (MFBC): a succinct BC algorithm based on novel sparse matrix multiplication routines that performs a factor of $p^{1/3}$ less communication on $p$ processors than the best known alternatives, for gra…
▽ More
Betweenness centrality (BC) is a crucial graph problem that measures the significance of a vertex by the number of shortest paths leading through it. We propose Maximal Frontier Betweenness Centrality (MFBC): a succinct BC algorithm based on novel sparse matrix multiplication routines that performs a factor of $p^{1/3}$ less communication on $p$ processors than the best known alternatives, for graphs with $n$ vertices and average degree $k=n/p^{2/3}$. We formulate, implement, and prove the correctness of MFBC for weighted graphs by leveraging monoids instead of semirings, which enables a surprisingly succinct formulation. MFBC scales well for both extremely sparse and relatively dense graphs. It automatically searches a space of distributed data decompositions and sparse matrix multiplication algorithms for the most advantageous configuration. The MFBC implementation outperforms the well-known CombBLAS library by up to 8x and shows more robust performance. Our design methodology is readily extensible to other graph problems.
△ Less
Submitted 9 August, 2017; v1 submitted 22 September, 2016;
originally announced September 2016.
-
Algorithms and Heuristics for Scalable Betweenness Centrality Computation on Multi-GPU Systems
Authors:
Flavio Vella,
Giancarlo Carbone,
Massimo Bernaschi
Abstract:
Betweenness Centrality (BC) is steadily growing in popularity as a metrics of the influence of a vertex in a graph. The BC score of a vertex is proportional to the number of all-pairs-shortest-paths passing through it. However, complete and exact BC computation for a large-scale graph is an extraordinary challenge that requires high performance computing techniques to provide results in a reasonab…
▽ More
Betweenness Centrality (BC) is steadily growing in popularity as a metrics of the influence of a vertex in a graph. The BC score of a vertex is proportional to the number of all-pairs-shortest-paths passing through it. However, complete and exact BC computation for a large-scale graph is an extraordinary challenge that requires high performance computing techniques to provide results in a reasonable amount of time. Our approach combines bi-dimensional (2-D) decomposition of the graph and multi-level parallelism together with a suitable data-thread map** that overcomes most of the difficulties caused by the irregularity of the computation on GPUs. Furthermore, we propose novel heuristics which exploit the topology information of the graph in order to reduce time and space requirements of BC computation. Experimental results on synthetic and real-world graphs show that the proposed techniques allow the BC computation of graphs which are too large to fit in the memory of a single computational node along with a significant reduction of the computing time.
△ Less
Submitted 2 February, 2016;
originally announced February 2016.
-
Artwork creation by a cognitive architecture integrating computational creativity and dual process approaches
Authors:
Agnese Augello,
Ignazio Infantino,
Antonio Lieto,
Giovanni Pilato,
Riccardo Rizzo,
Filippo Vella
Abstract:
The paper proposes a novel cognitive architecture (CA) for computational creativity based on the Psi model and on the mechanisms inspired by dual process theories of reasoning and rationality. In recent years, many cognitive models have focused on dual process theories to better describe and implement complex cognitive skills in artificial agents, but creativity has been approached only at a descr…
▽ More
The paper proposes a novel cognitive architecture (CA) for computational creativity based on the Psi model and on the mechanisms inspired by dual process theories of reasoning and rationality. In recent years, many cognitive models have focused on dual process theories to better describe and implement complex cognitive skills in artificial agents, but creativity has been approached only at a descriptive level. In previous works we have described various modules of the cognitive architecture that allows a robot to execute creative paintings. By means of dual process theories we refine some relevant mechanisms to obtain artworks, and in particular we explain details about the resolution level of the CA dealing with different strategies of access to the Long Term Memory (LTM) and managing the interaction between S1 and S2 processes of the dual process theory. The creative process involves both divergent and convergent processes in either implicit or explicit manner. This leads to four activities (exploratory, reflective, tacit, and analytic) that, triggered by urges and motivations, generate creative acts. These creative acts exploit both the LTM and the WM in order to make novel substitutions to a perceived image by properly mixing parts of pictures coming from different domains. The paper highlights the role of the interaction between S1 and S2 processes, modulated by the resolution level, which focuses the attention of the creative agent by broadening or narrowing the exploration of novel solutions, or even drawing the solution from a set of already made associations. An example of artificial painter is described in some experimentations by using a robotic platform.
△ Less
Submitted 4 January, 2016;
originally announced January 2016.