Search | arXiv e-print repository

Experience and Analysis of Scalable High-Fidelity Computational Fluid Dynamics on Modular Supercomputing Architectures

Authors: Martin Karp, Estela Suarez, Jan H. Meinke, Måns I. Andersson, Philipp Schlatter, Stefano Markidis, Niclas Jansson

Abstract: The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high… ▽ More The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between a Booster module powered by GPUs and a Cluster module with conventional CPU nodes. We investigate several different flow cases and computer systems based on the modular supercomputing architecture (MSA). We observe that for our simulations, the communication overhead and load balancing issues incurred by incorporating different computing architectures are seldom worthwhile, especially when I/O is also considered, but when the simulation at hand requires more than the combined global memory on the GPUs, utilizing additional CPUs to increase the available memory can be fruitful. We support our results with a simple performance model to assess when running across modules might be beneficial. As MSA is becoming more widespread and efforts to increase system utilization are growing more important our results give insight into when and how a monolithic application can utilize and spread out to more than one module and obtain a faster time to solution. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 13 pages, 5 figures, 3 tables, preprint

ACM Class: J.2; C.1.4; G.4

arXiv:2405.05639 [pdf, other]

Supercomputers as a Continous Medium

Authors: Martin Karp, Niclas Jansson, Philipp Schlatter, Stefano Markidis

Abstract: As supercomputers' complexity has grown, the traditional boundaries between processor, memory, network, and accelerators have blurred, making a homogeneous computer model, in which the overall computer system is modeled as a continuous medium with homogeneously distributed computational power, memory, and data movement transfer capabilities, an intriguing and powerful abstraction. By applying a ho… ▽ More As supercomputers' complexity has grown, the traditional boundaries between processor, memory, network, and accelerators have blurred, making a homogeneous computer model, in which the overall computer system is modeled as a continuous medium with homogeneously distributed computational power, memory, and data movement transfer capabilities, an intriguing and powerful abstraction. By applying a homogeneous computer model to algorithms with a given I/O complexity, we recover from first principles, other discrete computer models, such as the roofline model, parallel computing laws, such as Amdahl's and Gustafson's laws, and phenomenological observations, such as super-linear speedup. One of the homogeneous computer model's distinctive advantages is the capability of directly linking the performance limits of an application to the physical properties of a classical computer system. Applying the homogeneous computer model to supercomputers, such as Frontier, Fugaku, and the Nvidia DGX GH200, shows that applications, such as Conjugate Gradient (CG) and Fast Fourier Transforms (FFT), are rapidly approaching the fundamental classical computational limits, where the performance of even denser systems in terms of compute and memory are fundamentally limited by the speed of light. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 10 pages, 8 figures, 3 tables

ACM Class: F.1; F.2; I.6

arXiv:2306.08522 [pdf, other]

Challenges of Indoor SLAM: A multi-modal multi-floor dataset for SLAM evaluation

Authors: Pushyami Kaveti, Aniket Gupta, Dennis Giaya, Madeline Karp, Colin Keil, Jagatpreet Nir, Zhiyong Zhang, Hanumant Singh

Abstract: Robustness in Simultaneous Localization and Map** (SLAM) remains one of the key challenges for the real-world deployment of autonomous systems. SLAM research has seen significant progress in the last two and a half decades, yet many state-of-the-art (SOTA) algorithms still struggle to perform reliably in real-world environments. There is a general consensus in the research community that we need… ▽ More Robustness in Simultaneous Localization and Map** (SLAM) remains one of the key challenges for the real-world deployment of autonomous systems. SLAM research has seen significant progress in the last two and a half decades, yet many state-of-the-art (SOTA) algorithms still struggle to perform reliably in real-world environments. There is a general consensus in the research community that we need challenging real-world scenarios which bring out different failure modes in sensing modalities. In this paper, we present a novel multi-modal indoor SLAM dataset covering challenging common scenarios that a robot will encounter and should be robust to. Our data was collected with a mobile robotics platform across multiple floors at Northeastern University's ISEC building. Such a multi-floor sequence is typical of commercial office spaces characterized by symmetry across floors and, thus, is prone to perceptual aliasing due to similar floor layouts. The sensor suite comprises seven global shutter cameras, a high-grade MEMS inertial measurement unit (IMU), a ZED stereo camera, and a 128-channel high-resolution lidar. Along with the dataset, we benchmark several SLAM algorithms and highlight the problems faced during the runs, such as perceptual aliasing, visual degradation, and trajectory drift. The benchmarking results indicate that parts of the dataset work well with some algorithms, while other data sections are challenging for even the best SOTA algorithms. The dataset is available at https://github.com/neufieldrobotics/NUFR-M3F. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2304.10403 [pdf, other]

doi 10.3847/2041-8213/acd3e9

Anisotropic Satellite Galaxy Quenching: A Unique Signature of Energetic Feedback by Supermassive Black Holes?

Authors: Juliana S. M. Karp, Johannes U. Lange, Risa H. Wechsler

Abstract: The quenched fraction of satellite galaxies is aligned with the orientation of the halo's central galaxy, such that on average, satellites form stars at a lower rate along the major axis of the central. This effect, called anisotropic satellite galaxy quenching (ASGQ), has been found in observational data and cosmological simulations. Analyzing the IllustrisTNG simulation, Martín-Navarro et al. (2… ▽ More The quenched fraction of satellite galaxies is aligned with the orientation of the halo's central galaxy, such that on average, satellites form stars at a lower rate along the major axis of the central. This effect, called anisotropic satellite galaxy quenching (ASGQ), has been found in observational data and cosmological simulations. Analyzing the IllustrisTNG simulation, Martín-Navarro et al. (2021) recently argued that ASGQ is caused by anisotropic energetic feedback and constitutes "compelling observational evidence for the role of black holes in regulating galaxy evolution." In this letter, we study the causes of ASGQ in state-of-the-art galaxy formation simulations to evaluate this claim. We show that cosmological simulations predict that on average, satellite galaxies along the major axis of the dark matter halo tend to have been accreted at earlier cosmic times and are hosted by subhalos of larger peak halo masses. As a result, a modulation of the quenched fraction with respect to the major axis of the central galaxy is a natural prediction of hierarchical structure formation. We show that ASGQ is predicted by the UniverseMachine galaxy formation model, a model without anisotropic feedback. Furthermore, we demonstrate that even in the IllustrisTNG simulation, anisotropic satellite accretion properties are the main cause of ASGQ. Ultimately, we argue that ASGQ is not a reliable indicator of supermassive black hole feedback in galaxy formation simulations and, thus, should not be interpreted as such in observational data. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 7 pages, 4 figures; Submitted to ApJL; Comments welcome!

arXiv:2207.07098 [pdf, other]

Large-Scale Direct Numerical Simulations of Turbulence Using GPUs and Modern Fortran

Authors: Martin Karp, Daniele Massaro, Niclas Jansson, Alistair Hart, Jacob Wahlgren, Philipp Schlatter, Stefano Markidis

Abstract: We present our approach to making direct numerical simulations of turbulence with applications in sustainable ship**. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by… ▽ More We present our approach to making direct numerical simulations of turbulence with applications in sustainable ship**. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by performing the world's first direct numerical simulation of the flow around a Flettner rotor at Re=30'000 and its interaction with a turbulent boundary layer. We present one of the first performance comparisons between the AMD Instinct MI250X and Nvidia A100 GPUs for scalable computational fluid dynamics. Our results show that one MI250X offers performance on par with two A100 GPUs and has a similar power efficiency. △ Less

Submitted 23 June, 2022; originally announced July 2022.

Comments: 13 pages, 7 figures

ACM Class: G.4; J.2

arXiv:2204.12526 [pdf, other]

Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production

Authors: Syeda Sakira Hassan, Rahul Mangayil, Tommi Aho, Olli Yli-Harja, Matti Karp

Abstract: In this paper, we utilize a machine learning approach to identify the significant pathways for c-di-GMP signaling proteins. The dataset involves gene counts from 12 pathways and 5 essential c-di-GMP binding domains for 1024 bacterial genomes. Two novel approaches, Least absolute shrinkage and selection operator (Lasso) and Random forests, have been applied for analyzing and modeling the dataset. B… ▽ More In this paper, we utilize a machine learning approach to identify the significant pathways for c-di-GMP signaling proteins. The dataset involves gene counts from 12 pathways and 5 essential c-di-GMP binding domains for 1024 bacterial genomes. Two novel approaches, Least absolute shrinkage and selection operator (Lasso) and Random forests, have been applied for analyzing and modeling the dataset. Both approaches show that bacterial chemotaxis is the most essential pathway for c-di-GMP encoding domains. Though popular for feature selection, the strong regularization of Lasso method fails to associate any pathway to MshE domain. Results from the analysis may help to understand and emphasize the supporting pathways involved in bacterial cellulose production. These findings demonstrate the need for a chassis to restrict the behavior or functionality by deactivating the selective pathways in cellulose production. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Journal ref: EMBEC & NBC 2017. EMBEC NBC 2017 2017. IFMBE Proceedings, vol 65. Springer, Singapore

arXiv:2109.03592 [pdf, ps, other]

Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems

Authors: Jonathan Vincent, **g Gong, Martin Karp, Adam Peplinski, Niclas Jansson, Artur Podobas, Andreas Jocksch, Jie Yao, Fazle Hussain, Stefano Markidis, Matts Karlsson, Dirk Pleiter, Erwin Laure, Philipp Schlatter

Abstract: We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_τ=360$ and $Re_τ=550$, based on friction velocity and pipe radius. The strong… ▽ More We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_τ=360$ and $Re_τ=550$, based on friction velocity and pipe radius. The strong scaling is tested on several GPU-enabled HPC systems, including the Swiss Piz Daint system, TACC's Longhorn, Jülich's JUWELS Booster, and Berzelius in Sweden. The performance results show that speed-up between 3-5 can be achieved using the GPU accelerated version compared with the CPU version on these different systems. The run-time for 20 timesteps reduces from 43.5 to 13.2 seconds with increasing the number of GPUs from 64 to 512 for $Re_τ=550$ case on JUWELS Booster system. This illustrates the GPU accelerated version the potential for high throughput. At the same time, the strong scaling limit is significantly larger for GPUs, at about $2000-5000$ elements per rank; compared to about $50-100$ for a CPU-rank. △ Less

Submitted 4 November, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: 9 pages, 8 figures. Submitted to HPC-Asia 2022 conference, updated to address reviewers comments

ACM Class: G.4; J.2; C.1

arXiv:2108.12188 [pdf, ps, other]

doi 10.1145/3492805.3492808

A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays

Authors: Martin Karp, Artur Podobas, Tobias Kenter, Niclas Jansson, Christian Plessl, Philipp Schlatter, Stefano Markidis

Abstract: The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand. In this work, we de… ▽ More The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand. In this work, we design a custom FPGA-based accelerator for a computational fluid dynamics (CFD) code. Unlike prior work -- which often focuses on accelerating small kernels -- we target the entire Poisson solver on unstructured meshes based on the high-fidelity spectral element method (SEM) used in modern state-of-the-art CFD systems. We model our accelerator using an analytical performance model based on the I/O cost of the algorithm. We empirically evaluate our accelerator on a state-of-the-art Intel Stratix 10 FPGA in terms of performance and power consumption and contrast it against existing solutions on general-purpose processors (CPUs). Finally, we propose a data movement-reducing technique where we compute geometric factors on the fly, which yields significant (700+ Gflop/s) single-precision performance and an upwards of 2x reduction in runtime for the local evaluation of the Laplace operator. We end the paper by discussing the challenges and opportunities of using reconfigurable architecture in the future, particularly in the light of emerging (not yet available) technologies. △ Less

Submitted 2 November, 2021; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: 12 pages, 3 figures, 3 tables, Accepted to HPC Asia 2022

ACM Class: G.4; J.2; C.1

arXiv:2107.01243 [pdf]

Neko: A Modern, Portable, and Scalable Framework for High-Fidelity Computational Fluid Dynamics

Authors: Niclas Jansson, Martin Karp, Artur Podobas, Stefano Markidis, Philipp Schlatter

Abstract: Recent trends and advancement in including more diverse and heterogeneous hardware in High-Performance Computing is challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim "software outlives hardware" may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. C… ▽ More Recent trends and advancement in including more diverse and heterogeneous hardware in High-Performance Computing is challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim "software outlives hardware" may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. CFD is one of the many application domains affected. In this paper, we present Neko, a portable framework for high-order spectral element flow simulations. Unlike prior works, Neko adopts a modern object-oriented approach, allowing multi-tier abstractions of the solver stack and facilitating hardware backends ranging from general-purpose processors down to exotic vector processors and FPGAs. We show that Neko's performance and accuracy are comparable to NekRS, and thus on-par with Nek5000's successor on modern CPU machines. Furthermore, we develop a performance model, which we use to discuss challenges and opportunities for high-order solvers on emerging hardware. △ Less

Submitted 2 July, 2021; originally announced July 2021.

arXiv:2010.13463 [pdf]

High-Performance Spectral Element Methods on Field-Programmable Gate Arrays

Authors: Martin Karp, Artur Podobas, Niclas Jansson, Tobias Kenter, Christian Plessl, Philipp Schlatter, Stefano Markidis

Abstract: Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate… ▽ More Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate Array (FPGA), which strikes a convenient balance between complexity and performance. In this paper, we study modern FPGAs' applicability in accelerating the Spectral Element Method (SEM) core to many computational fluid dynamics (CFD) applications. We design a custom SEM hardware accelerator operating in double-precision that we empirically evaluate on the latest Stratix 10 GX-series FPGAs and position its performance (and power-efficiency) against state-of-the-art systems such as ARM ThunderX2, NVIDIA Pascal/Volta/Ampere Tesla-series cards, and general-purpose manycore CPUs. Finally, we develop a performance model for our SEM-accelerator, which we use to project future FPGAs' performance and role to accelerate CFD applications, ultimately answering the question: what characteristics would a perfect FPGA for CFD applications have? △ Less

Submitted 4 May, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: 10 pages, IEEE International Parallel and Distributed Processing Symposium 2021 (IPDPS'21)

ACM Class: G.4; J.2; C.1

arXiv:2010.10571 [pdf, other]

doi 10.1017/jfm.2020.935

Shock-induced heating and transition to turbulence in a hypersonic boundary layer

Authors: Lin Fu, Michael Karp, Sanjeeb T. Bose, Parviz Moin, Javier Urzay

Abstract: The interaction between an incident shock wave and a Mach-6 undisturbed hypersonic laminar boundary layer over a cold wall is addressed using direct numerical simulations (DNS) and wall-modeled large-eddy simulations (WMLES) at different angles of incidence. At sufficiently high shock-incidence angles, the boundary layer transitions to turbulence via breakdown of near-wall streaks shortly downstre… ▽ More The interaction between an incident shock wave and a Mach-6 undisturbed hypersonic laminar boundary layer over a cold wall is addressed using direct numerical simulations (DNS) and wall-modeled large-eddy simulations (WMLES) at different angles of incidence. At sufficiently high shock-incidence angles, the boundary layer transitions to turbulence via breakdown of near-wall streaks shortly downstream of the shock im**ement, without the need of any inflow free-stream disturbances. The transition causes a localized significant increase in the Stanton number and skin-friction coefficient, with high incidence angles augmenting the peak thermomechanical loads in an approximately linear way. Statistical analyses of the boundary layer downstream of the interaction for each case are provided that quantify streamwise spatial variations of the Reynolds analogy factors and indicate a breakdown of the Morkovin's hypothesis near the wall, where velocity and temperature become correlated. A modified strong Reynolds analogy with a fixed turbulent Prandtl number is observed to perform best. Conventional transformations fail at collapsing the mean velocity profiles on the incompressible log law. The WMLES prompts transition and peak heating, delays separation, and advances reattachment, thereby shortening the separation bubble. When the shock leads to transition, WMLES provides predictions of DNS peak thermomechanical loads within $\pm 10\%$ at a computational cost lower than DNS by two orders of magnitude. Downstream of the interaction, in the turbulent boundary layer, WMLES agrees well with DNS results for the Reynolds analogy factor, the mean profiles of velocity and temperature, including the temperature peak, and the temperature/velocity correlation. △ Less

Submitted 20 October, 2020; originally announced October 2020.

Comments: 48 pages, 36 figures

MSC Class: 76K05; 76N06; 76N20; 76F06; 76F40; 76F50; 76F65; 76F02

arXiv:2005.13425 [pdf]

Optimization of Tensor-product Operations in Nekbone on GPUs

Authors: Martin Karp, Niclas Jansson, Artur Podobas, Philipp Schlatter, Stefano Markidis

Abstract: In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to… ▽ More In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to make the computations layer by layer. This enables us to use loop unrolling as well as utilize registers and shared memory efficiently. Our implementation is then compared on both the Pascal and Volta GPU architectures to previous GPU versions of Nekbone as well as a measured roofline. The results show that our implementation outperforms previous GPU Nekbone implementations by 6-10%. Compared to the measured roofline, we obtain 77 - 92% of the peak performance for both Nvidia P100 and V100 GPUs for inputs with 1024 - 4096 elements and polynomial degree 9. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Comments: 4 pages, 4 figures

ACM Class: G.4; J.2

arXiv:2005.05303 [pdf, other]

doi 10.1017/jfm.2020.902

Cause-and-effect of linear mechanisms sustaining wall turbulence

Authors: Adrián Lozano-Durán, Navid C. Constantinou, Marios-Andreas Nikolaidis, Michael Karp

Abstract: Despite the nonlinear nature of turbulence, there is evidence that part of the energy-transfer mechanisms sustaining wall turbulence can be ascribed to linear processes. The different scenarios stem from linear stability theory and comprise exponential instabilities, neutral modes, transient growth from non-normal operators, and parametric instabilities from temporal mean-flow variations, among ot… ▽ More Despite the nonlinear nature of turbulence, there is evidence that part of the energy-transfer mechanisms sustaining wall turbulence can be ascribed to linear processes. The different scenarios stem from linear stability theory and comprise exponential instabilities, neutral modes, transient growth from non-normal operators, and parametric instabilities from temporal mean-flow variations, among others. These mechanisms, each potentially capable of leading to the observed turbulence structure, are rooted in theoretical and conceptual arguments. Whether the flow follows any or a combination of them remains elusive. Here, we evaluate the linear mechanisms responsible for the energy transfer from the streamwise-averaged mean-flow ($\bf U$) to the fluctuating velocities ($\bf u'$). We use cause-and-effect analysis based on interventions. This is achieved by direct numerical simulation of turbulent channel flows at low Reynolds number, in which the energy transfer from $\bf U$ to $\bf u'$ is constrained to preclude a targeted linear mechanism. We show that transient growth is sufficient for sustaining realistic wall turbulence. Self-sustaining turbulence persists when exponential instabilities, neutral modes, and parametric instabilities of the mean flow are suppressed. We further show that a key component of transient growth is the Orr/push-over mechanism induced by spanwise variations of the base flow. Finally, we demonstrate that an ensemble of simulations with various frozen-in-time $\bf U$ arranged so that only transient growth is active, can faithfully represent the energy transfer from $\bf U$ to $\bf u'$ as in realistic turbulence. Our approach provides direct cause-and-effect evaluation of the linear energy-injection mechanisms from $\bf U$ to $\bf u'$ in the fully nonlinear system and simplifies the conceptual model of self-sustaining wall turbulence. △ Less

Submitted 7 October, 2020; v1 submitted 9 May, 2020; originally announced May 2020.

Journal ref: J. Fluid Mech., vol. 914, A8, 2021

arXiv:1912.07532 [pdf, other]

doi 10.1088/1742-6596/1522/1/012003

Alternative physics to understand wall turbulence: Navier-Stokes equations with modified linear dynamics

Authors: Adrán Lozano-Durán, Marios-Andreas Nikolaidis, Navid C. Constantinou, Michael Karp

Abstract: Despite the nonlinear nature of wall turbulence, there is evidence that the energy-injection mechanisms sustaining wall turbulence can be ascribed to linear processes. The different scenarios stem from linear stability theory and comprise exponential instabilities from mean-flow inflection points, transient growth from non-normal operators, and parametric instabilities from temporal mean-flow vari… ▽ More Despite the nonlinear nature of wall turbulence, there is evidence that the energy-injection mechanisms sustaining wall turbulence can be ascribed to linear processes. The different scenarios stem from linear stability theory and comprise exponential instabilities from mean-flow inflection points, transient growth from non-normal operators, and parametric instabilities from temporal mean-flow variations, among others. These mechanisms, each potentially capable of leading to the observed turbulence structure, are rooted in simplified theories and conceptual arguments. Whether the flow follows any or a combination of them remains unclear. In the present study, we devise a collection of numerical experiments in which the Navier-Stokes equations are sensibly modified to quantify the role of the different linear mechanisms. This is achieved by direct numerical simulation of turbulent channel flows with constrained energy extraction from the streamwise-averaged mean-flow. We demonstrate that (i) transient growth alone is not sufficient to sustain wall turbulence and (ii) the flow remains turbulent when the exponential instabilities are suppressed. On the other hand, we show that (iii) transient growth combined with the parametric instability of the time-varying mean-flow is able to sustain turbulence. △ Less

Submitted 13 December, 2019; originally announced December 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1909.05490

arXiv:1909.05490 [pdf, ps, other]

Wall turbulence without modal instability of the streaks

Authors: Adrián Lozano-Durán, Marios-Andreas Nikolaidis, Navid C. Constantinou, Michael Karp

Abstract: Despite the nonlinear nature of wall turbulence, there is evidence that the mechanism underlying the energy transfer from the mean flow to the turbulent fluctuations can be ascribed to linear processes. One of the most acclaimed linear instabilities for this energy transfer is the modal growth of perturbations with respect to the streamwise-averaged flow (or streaks). Here, we devise a numerical e… ▽ More Despite the nonlinear nature of wall turbulence, there is evidence that the mechanism underlying the energy transfer from the mean flow to the turbulent fluctuations can be ascribed to linear processes. One of the most acclaimed linear instabilities for this energy transfer is the modal growth of perturbations with respect to the streamwise-averaged flow (or streaks). Here, we devise a numerical experiment in which the Navier--Stokes equations are sensibly modified to suppress these modal instabilities. Our results demonstrate that wall turbulence is sustained with realistic mean and fluctuating velocities despite the absence of streak instabilities. △ Less

Submitted 12 September, 2019; originally announced September 2019.

arXiv:1902.01914 [pdf, ps, other]

Wall turbulence with constrained energy extraction from mean flow

Authors: Adrián Lozano-Durán, Michael Karp, Navid. C. Constantinou

Abstract: We study the mechanism of energy injection from the mean flow to the fluctuating velocity necessary to maintain wall turbulence. This process is believed to be correctly represented by the linearized Navier--Stokes equations, and three potential linear mechanisms have been considered, namely, modal instability of the streamwise mean cross-flow $U(y,z,t)$, non-modal transient growth, and non-modal… ▽ More We study the mechanism of energy injection from the mean flow to the fluctuating velocity necessary to maintain wall turbulence. This process is believed to be correctly represented by the linearized Navier--Stokes equations, and three potential linear mechanisms have been considered, namely, modal instability of the streamwise mean cross-flow $U(y,z,t)$, non-modal transient growth, and non-modal transient growth supported by parametric instability. We have designed three numerical experiments of plane turbulent channel flow with additional forcing terms aiming to neutralize one or various linear mechanisms for energy extraction. From our preliminary experiments, only cases with mean cross-flows capable of supporting modal instabilities were found to sustain turbulence. However, the question whether such a new turbulence complies with the same physical mechanisms as those occurring in actual (unforced) turbulence remains unanswered. On the other hand, cases exclusively supported by transient growth decayed until laminarization. △ Less

Submitted 13 February, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

Comments: Corrected typo in author name

Journal ref: Center for Turbulence Research Annual Research Briefs 2018

arXiv:1807.06701 [pdf, other]

Massively Parallel Symmetry Breaking on Sparse Graphs: MIS and Maximal Matching

Authors: Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Richard M. Karp

Abstract: The success of modern parallel paradigms such as MapReduce, Hadoop, or Spark, has attracted a significant attention to the Massively Parallel Computation (MPC) model over the past few years, especially on graph problems. In this work, we consider symmetry breaking problems of maximal independent set (MIS) and maximal matching (MM), which are among the most intensively studied problems in distribut… ▽ More The success of modern parallel paradigms such as MapReduce, Hadoop, or Spark, has attracted a significant attention to the Massively Parallel Computation (MPC) model over the past few years, especially on graph problems. In this work, we consider symmetry breaking problems of maximal independent set (MIS) and maximal matching (MM), which are among the most intensively studied problems in distributed/parallel computing, in MPC. These problems are known to admit efficient MPC algorithms if the space per machine is near-linear in $n$, the number of vertices in the graph. This space requirement however, as observed in the literature, is often significantly larger than we can afford; especially when the input graph is sparse. In a sharp contrast, in the truly sublinear regime of $n^{1-Ω(1)}$ space per machine, all the known algorithms take $\log^{Ω(1)} n$ rounds which is considered inefficient. Motivated by this shortcoming, we parametrize our algorithms by the arboricity $α$ of the input graph, which is a well-received measure of its sparsity. We show that both MIS and MM admit $O(\sqrt{\log α}\cdot\log\log α+ \log^2\log n)$ round algorithms using $O(n^ε)$ space per machine for any constant $ε\in (0, 1)$ and using $\widetilde{O}(m)$ total space. Therefore, for the wide range of sparse graphs with small arboricity---such as minor-free graphs, bounded-genus graphs or bounded treewidth graphs---we get an $O(\log^2 \log n)$ round algorithm which exponentially improves prior algorithms. By known reductions, our results also imply a $(1+ε)$-approximation of maximum cardinality matching, a $(2+ε)$-approximation of maximum weighted matching, and a 2-approximation of minimum vertex cover with essentially the same round complexity and memory requirements. △ Less

Submitted 6 May, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

Comments: A merger of this paper and the independent and concurrent paper [arxiv:1807.05374] appeared at PODC 2019

arXiv:1111.5572 [pdf, other]

Faster and More Accurate Sequence Alignment with SNAP

Authors: Matei Zaharia, William J. Bolosky, Kristal Curtis, Armando Fox, David Patterson, Scott Shenker, Ion Stoica, Richard M. Karp, Taylor Sittler

Abstract: We present the Scalable Nucleotide Alignment Program (SNAP), a new short and long read aligner that is both more accurate (i.e., aligns more reads with fewer errors) and 10-100x faster than state-of-the-art tools such as BWA. Unlike recent aligners based on the Burrows-Wheeler transform, SNAP uses a simple hash index of short seed sequences from the genome, similar to BLAST's. However, SNAP greatl… ▽ More We present the Scalable Nucleotide Alignment Program (SNAP), a new short and long read aligner that is both more accurate (i.e., aligns more reads with fewer errors) and 10-100x faster than state-of-the-art tools such as BWA. Unlike recent aligners based on the Burrows-Wheeler transform, SNAP uses a simple hash index of short seed sequences from the genome, similar to BLAST's. However, SNAP greatly reduces the number and cost of local alignment checks performed through several measures: it uses longer seeds to reduce the false positive locations considered, leverages larger memory capacities to speed index lookup, and excludes most candidate locations without fully computing their edit distance to the read. The result is an algorithm that scales well for reads from one hundred to thousands of bases long and provides a rich error model that can match classes of mutations (e.g., longer indels) that today's fast aligners ignore. We calculate that SNAP can align a dataset with 30x coverage of a human genome in less than an hour for a cost of $2 on Amazon EC2, with higher accuracy than BWA. Finally, we describe ongoing work to further improve SNAP. △ Less

Submitted 23 November, 2011; originally announced November 2011.

arXiv:1009.0909 [pdf, other]

Comparing Pedigree Graphs

Authors: Bonnie Kirkpatrick, Yakir Reshef, Hilary Finucane, Haitao Jiang, Binhai Zhu, Richard M. Karp

Abstract: Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the esti… ▽ More Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the estimated pedigree to the true pedigree. In this paper, we consider two main pedigree comparison problems. The first is the pedigree isomorphism problem, for which we present a linear-time algorithm for leaf-labeled pedigrees. The second is the pedigree edit distance problem, for which we present 1) several algorithms that are fast and exact in various special cases, and 2) a general, randomized heuristic algorithm. In the negative direction, we first prove that the pedigree isomorphism problem is as hard as the general graph isomorphism problem, and that the sub-pedigree isomorphism problem is NP-hard. We then show that the pedigree edit distance problem is APX-hard in general and NP-hard on leaf-labeled pedigrees. We use simulated pedigrees to compare our edit-distance algorithms to each other as well as to a branch-and-bound algorithm that always finds an optimal solution. △ Less

Submitted 18 October, 2011; v1 submitted 5 September, 2010; originally announced September 2010.

arXiv:0707.1532 [pdf, ps, other]

Sorting and Selection in Posets

Authors: Constantinos Daskalakis, Richard M. Karp, Elchanan Mossel, Samantha Riesenfeld, Elad Verbin

Abstract: Classical problems of sorting and searching assume an underlying linear ordering of the objects being compared. In this paper, we study a more general setting, in which some pairs of objects are incomparable. This generalization is relevant in applications related to rankings in sports, college admissions, or conference submissions. It also has potential applications in biology, such as comparin… ▽ More Classical problems of sorting and searching assume an underlying linear ordering of the objects being compared. In this paper, we study a more general setting, in which some pairs of objects are incomparable. This generalization is relevant in applications related to rankings in sports, college admissions, or conference submissions. It also has potential applications in biology, such as comparing the evolutionary fitness of different strains of bacteria, or understanding input-output relations among a set of metabolic reactions or the causal influences among a set of interacting genes or proteins. Our results improve and extend results from two decades ago of Faigle and Turán. A measure of complexity of a partially ordered set (poset) is its width. Our algorithms obtain information about a poset by queries that compare two elements. We present an algorithm that sorts, i.e. completely identifies, a width w poset of size n and has query complexity O(wn + nlog(n)), which is within a constant factor of the information-theoretic lower bound. We also show that a variant of Mergesort has query complexity O(wn(log(n/w))) and total complexity O((w^2)nlog(n/w)). Faigle and Turán have shown that the sorting problem has query complexity O(wn(log(n/w))) but did not address its total complexity. For the related problem of determining the minimal elements of a poset, we give efficient deterministic and randomized algorithms with O(wn) query and total complexity, along with matching lower bounds for the query complexity up to a factor of 2. We generalize these results to the k-selection problem of determining the elements of height at most k. We also derive upper bounds on the total complexity of some other problems of a similar flavor. △ Less

Submitted 10 July, 2007; originally announced July 2007.

Comments: 24 pages

ACM Class: F.2.2; G.2.1; G.2.2

arXiv:q-bio/0702001 [pdf, ps, other]

Comparing Protein Interaction Networks via a Graph Match-and-Split Algorithm

Authors: Manikandan Narayanan, Richard M. Karp

Abstract: We present a method that compares the protein interaction networks of two species to detect functionally similar (conserved) protein modules between them. The method is based on an algorithm we developed to identify matching subgraphs between two graphs. Unlike previous network comparison methods, our algorithm has provable guarantees on correctness and efficiency. Our algorithm framework also a… ▽ More We present a method that compares the protein interaction networks of two species to detect functionally similar (conserved) protein modules between them. The method is based on an algorithm we developed to identify matching subgraphs between two graphs. Unlike previous network comparison methods, our algorithm has provable guarantees on correctness and efficiency. Our algorithm framework also admits quite general connectivity and local matching criteria that define when two subgraphs match and constitute a conserved module. We apply our method to pairwise comparisons of the yeast protein network with the human, fruit fly and nematode worm protein networks, using a lenient criterion based on connectedness and matching edges, coupled with a betweenness clustering heuristic. We evaluate the detected conserved modules against reference yeast protein complexes using sensitivity and specificity measures. In these evaluations, our method performs competitively with and sometimes better than two previous network comparison methods. Further under some conditions (proper homolog and species selection), our method performs better than a popular single-species clustering method. Beyond these evaluations, we discuss the biology of a couple of conserved modules detected by our method. We demonstrate the utility of network comparison for transferring annotations from yeast proteins to human ones, and validate the predicted annotations. △ Less

Submitted 1 February, 2007; originally announced February 2007.

Comments: 15 pages, 4 figures, 6 tables. Supplemental text available at http://www.cs.berkeley.edu/~nmani/mas-supplement.pdf

arXiv:cs/0702014 [pdf, ps, other]

doi 10.1109/TIT.2008.926452

Probabilistic Analysis of Linear Programming Decoding

Authors: Constantinos Daskalakis, Alexandros G. Dimakis, Richard M. Karp, Martin J. Wainwright

Abstract: We initiate the probabilistic analysis of linear programming (LP) decoding of low-density parity-check (LDPC) codes. Specifically, we show that for a random LDPC code ensemble, the linear programming decoder of Feldman et al. succeeds in correcting a constant fraction of errors with high probability. The fraction of correctable errors guaranteed by our analysis surpasses previous non-asymptotic… ▽ More We initiate the probabilistic analysis of linear programming (LP) decoding of low-density parity-check (LDPC) codes. Specifically, we show that for a random LDPC code ensemble, the linear programming decoder of Feldman et al. succeeds in correcting a constant fraction of errors with high probability. The fraction of correctable errors guaranteed by our analysis surpasses previous non-asymptotic results for LDPC codes, and in particular exceeds the best previous finite-length result on LP decoding by a factor greater than ten. This improvement stems in part from our analysis of probabilistic bit-flip** channels, as opposed to adversarial channels. At the core of our analysis is a novel combinatorial characterization of LP decoding success, based on the notion of a generalized matching. An interesting by-product of our analysis is to establish the existence of ``probabilistic expansion'' in random bipartite graphs, in which one requires only that almost every (as opposed to every) set of a certain size expands, for sets much larger than in the classical worst-case setting. △ Less

Submitted 10 March, 2008; v1 submitted 2 February, 2007; originally announced February 2007.

Comments: To appear, IEEE Transactions on Information Theory, (replaces shorter version that appeared in SODA'07)

Showing 1–22 of 22 results for author: Karp, M