-
Interpolation-based immersogeometric analysis methods for multi-material and multi-physics problems
Authors:
Jennifer E. Fromm,
Nils Wunsch,
Kurt Maute,
John A. Evans,
Jiun-Shyan Chen
Abstract:
Immersed boundary methods are high-order accurate computational tools used to model geometrically complex problems in computational mechanics. While traditional finite element methods require the construction of high-quality boundary-fitted meshes, immersed boundary methods instead embed the computational domain in a background grid. Interpolation-based immersed boundary methods augment existing f…
▽ More
Immersed boundary methods are high-order accurate computational tools used to model geometrically complex problems in computational mechanics. While traditional finite element methods require the construction of high-quality boundary-fitted meshes, immersed boundary methods instead embed the computational domain in a background grid. Interpolation-based immersed boundary methods augment existing finite element software to non-invasively implement immersed boundary capabilities through extraction. Extraction interpolates the background basis as a linear combination of Lagrange polynomials defined on a foreground mesh, creating an interpolated basis that can be easily integrated by existing methods. This work extends the interpolation-based immersed boundary method to multi-material and multi-physics problems. Beginning from level-set descriptions of domain geometries, Heaviside enrichment is implemented to accommodate discontinuities in state variable fields across material interfaces. Adaptive refinement with truncated hierarchical B-splines is used to both improve interface geometry representations and resolve large solution gradients near interfaces. Multi-physics problems typically involve coupled fields where each field has unique discretization requirements. This work presents a novel discretization method for coupled problems through the application of extraction, using a single foreground mesh for all fields. Numerical examples illustrate optimal convergence rates for this method in both 2D and 3D, for heat conduction, linear elasticity, and a coupled thermo-mechanical problem. The utility of this method is demonstrated through image-based analysis of a composite sample, where in addition to circumventing typical meshing difficulties, this method reduces the required degrees of freedom compared to classical boundary-fitted finite element methods.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Interpolation-based immersed finite element and isogeometric analysis
Authors:
Jennifer E. Fromm,
Nils Wunsch,
Ru Xiang,
Han Zhao,
Kurt Maute,
John A. Evans,
David Kamensky
Abstract:
We introduce a new paradigm for immersed finite element and isogeometric methods based on interpolating function spaces from an unfitted background mesh into Lagrange finite element spaces defined on a foreground mesh that captures the domain geometry but is otherwise subject to minimal constraints on element quality or connectivity. This is a generalization of the concept of Lagrange extraction f…
▽ More
We introduce a new paradigm for immersed finite element and isogeometric methods based on interpolating function spaces from an unfitted background mesh into Lagrange finite element spaces defined on a foreground mesh that captures the domain geometry but is otherwise subject to minimal constraints on element quality or connectivity. This is a generalization of the concept of Lagrange extraction from the isogeometric analysis literature and also related to certain variants of the finite cell and material point methods. Crucially, the interpolation may be approximate without sacrificing high-order convergence rates, which distinguishes the present method from existing finite cell, CutFEM, and immersogeometric approaches. The interpolation paradigm also permits non-invasive reuse of existing finite element software for immersed analysis. We analyze the properties of the interpolation-based immersed paradigm for a model problem and implement it on top of the open-source FEniCS finite element software, to apply it to a variety of problems in fluid, solid, and structural mechanics where we demonstrate high-order accuracy and applicability to practical geometries like trimmed spline patches.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Automated Backend-Aware Post-Training Quantization
Authors:
Ziheng Jiang,
Animesh Jain,
Andrew Liu,
Josh Fromm,
Chengqian Ma,
Tianqi Chen,
Luis Ceze
Abstract:
Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment. However, different hardware backends such as x86 CPU, NVIDIA GPU, ARM CPU, and accelerators may demand different implementations for quantized networks. This diversity calls for specialized post-training quantization pipelines to built for each hardware target, an engineerin…
▽ More
Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment. However, different hardware backends such as x86 CPU, NVIDIA GPU, ARM CPU, and accelerators may demand different implementations for quantized networks. This diversity calls for specialized post-training quantization pipelines to built for each hardware target, an engineering effort that is often too large for developers to keep up with. We tackle this problem with an automated post-training quantization framework called HAGO. HAGO provides a set of general quantization graph transformations based on a user-defined hardware specification and implements a search mechanism to find the optimal quantization strategy while satisfying hardware constraints for any model. We observe that HAGO achieves speedups of 2.09x, 1.97x, and 2.48x on Intel Xeon Cascade Lake CPUs, NVIDIA Tesla T4 GPUs, ARM Cortex-A CPUs on Raspberry Pi4 relative to full precision respectively, while maintaining the highest reported post-training quantization accuracy in each case.
△ Less
Submitted 27 March, 2021;
originally announced March 2021.
-
SplitSR: An End-to-End Approach to Super-Resolution on Mobile Devices
Authors:
Xin Liu,
Yuang Li,
Josh Fromm,
Yuntao Wang,
Ziheng Jiang,
Alex Mariakakis,
Shwetak Patel
Abstract:
Super-resolution (SR) is a coveted image processing technique for mobile apps ranging from the basic camera apps to mobile health. Existing SR algorithms rely on deep learning models with significant memory requirements, so they have yet to be deployed on mobile devices and instead operate in the cloud to achieve feasible inference time. This shortcoming prevents existing SR methods from being use…
▽ More
Super-resolution (SR) is a coveted image processing technique for mobile apps ranging from the basic camera apps to mobile health. Existing SR algorithms rely on deep learning models with significant memory requirements, so they have yet to be deployed on mobile devices and instead operate in the cloud to achieve feasible inference time. This shortcoming prevents existing SR methods from being used in applications that require near real-time latency. In this work, we demonstrate state-of-the-art latency and accuracy for on-device super-resolution using a novel hybrid architecture called SplitSR and a novel lightweight residual block called SplitSRBlock. The SplitSRBlock supports channel-splitting, allowing the residual blocks to retain spatial information while reducing the computation in the channel dimension. SplitSR has a hybrid design consisting of standard convolutional blocks and lightweight residual blocks, allowing people to tune SplitSR for their computational budget. We evaluate our system on a low-end ARM CPU, demonstrating both higher accuracy and up to 5 times faster inference than previous approaches. We then deploy our model onto a smartphone in an app called ZoomSR to demonstrate the first-ever instance of on-device, deep learning-based SR. We conducted a user study with 15 participants to have them assess the perceived quality of images that were post-processed by SplitSR. Relative to bilinear interpolation -- the existing standard for on-device SR -- participants showed a statistically significant preference when looking at both images (Z=-9.270, p<0.01) and text (Z=-6.486, p<0.01).
△ Less
Submitted 20 January, 2021;
originally announced January 2021.
-
MetaPhys: Few-Shot Adaptation for Non-Contact Physiological Measurement
Authors:
Xin Liu,
Ziheng Jiang,
Josh Fromm,
Xuhai Xu,
Shwetak Patel,
Daniel McDuff
Abstract:
There are large individual differences in physiological processes, making designing personalized health sensing algorithms challenging. Existing machine learning systems struggle to generalize well to unseen subjects or contexts and can often contain problematic biases. Video-based physiological measurement is not an exception. Therefore, learning personalized or customized models from a small num…
▽ More
There are large individual differences in physiological processes, making designing personalized health sensing algorithms challenging. Existing machine learning systems struggle to generalize well to unseen subjects or contexts and can often contain problematic biases. Video-based physiological measurement is not an exception. Therefore, learning personalized or customized models from a small number of unlabeled samples is very attractive as it would allow fast calibrations to improve generalization and help correct biases. In this paper, we present a novel meta-learning approach called MetaPhys for personalized video-based cardiac measurement for contactless pulse and heart rate monitoring. Our method uses only 18-seconds of video for customization and works effectively in both supervised and unsupervised manners. We evaluate our proposed approach on two benchmark datasets and demonstrate superior performance in cross-dataset evaluation with substantial reductions (42% to 44%) in errors compared with state-of-the-art approaches. We have also demonstrated our proposed method significantly helps reduce the bias in skin type.
△ Less
Submitted 5 March, 2021; v1 submitted 5 October, 2020;
originally announced October 2020.
-
Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement
Authors:
Xin Liu,
Josh Fromm,
Shwetak Patel,
Daniel McDuff
Abstract:
Telehealth and remote health monitoring have become increasingly important during the SARS-CoV-2 pandemic and it is widely expected that this will have a lasting impact on healthcare practices. These tools can help reduce the risk of exposing patients and medical staff to infection, make healthcare services more accessible, and allow providers to see more patients. However, objective measurement o…
▽ More
Telehealth and remote health monitoring have become increasingly important during the SARS-CoV-2 pandemic and it is widely expected that this will have a lasting impact on healthcare practices. These tools can help reduce the risk of exposing patients and medical staff to infection, make healthcare services more accessible, and allow providers to see more patients. However, objective measurement of vital signs is challenging without direct contact with a patient. We present a video-based and on-device optical cardiopulmonary vital sign measurement approach. It leverages a novel multi-task temporal shift convolutional attention network (MTTS-CAN) and enables real-time cardiovascular and respiratory measurements on mobile platforms. We evaluate our system on an Advanced RISC Machine (ARM) CPU and achieve state-of-the-art accuracy while running at over 150 frames per second which enables real-time applications. Systematic experimentation on large benchmark datasets reveals that our approach leads to substantial (20%-50%) reductions in error and generalizes well across datasets.
△ Less
Submitted 28 February, 2021; v1 submitted 6 June, 2020;
originally announced June 2020.
-
The Potential of Social Media Analytics for Improving Social Media Communication of Emergency Agencies
Authors:
Milad Mirbabaie,
Jennifer Fromm,
Simone Löppenberg,
Sophie Meinig,
Matthias Reuße
Abstract:
A growing number of people use social media to seek information or coordinate relief activities in times of crisis. Thus, social media is increasingly deployed by emergency agencies as well to reach more people in crisis situations. However, the large amount of available data on social media could also be used by emergency agencies to understand how they are perceived by the public and to improve…
▽ More
A growing number of people use social media to seek information or coordinate relief activities in times of crisis. Thus, social media is increasingly deployed by emergency agencies as well to reach more people in crisis situations. However, the large amount of available data on social media could also be used by emergency agencies to understand how they are perceived by the public and to improve their communication. In this study, we examined the Twitter communication about the German emergency agency "Johanniter-Unfall-Hilfe" by conducting a frequency, sentiment, social network and content analysis. The results reveal that a right-wing political cluster politically instrumentalised an incident related to this agency. Furthermore, some individuals used social media to express criticism. It can be concluded that the use of social media analytics in the daily routine of emergency management professionals can be beneficial for improving their social media communication strategy.
△ Less
Submitted 18 April, 2020;
originally announced April 2020.
-
A Hardware-Software Blueprint for Flexible Deep Learning Specialization
Authors:
Thierry Moreau,
Tianqi Chen,
Luis Vega,
Jared Roesch,
Eddie Yan,
Lianmin Zheng,
Josh Fromm,
Ziheng Jiang,
Luis Ceze,
Carlos Guestrin,
Arvind Krishnamurthy
Abstract:
Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility. Changes in algorithms, models, operators, or numerical systems threaten the viability of specialized hardware accelerators. We propose VTA, a programmable deep learning architecture templat…
▽ More
Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility. Changes in algorithms, models, operators, or numerical systems threaten the viability of specialized hardware accelerators. We propose VTA, a programmable deep learning architecture template designed to be extensible in the face of evolving workloads. VTA achieves this flexibility via a parametrizable architecture, two-level ISA, and a JIT compiler. The two-level ISA is based on (1) a task-ISA that explicitly orchestrates concurrent compute and memory tasks and (2) a microcode-ISA which implements a wide variety of operators with single-cycle tensor-tensor operations. Next, we propose a runtime system equipped with a JIT compiler for flexible code-generation and heterogeneous execution that enables effective use of the VTA architecture. VTA is integrated and open-sourced into Apache TVM, a state-of-the-art deep learning compilation stack that provides flexibility for diverse models and divergent hardware backends. We propose a flow that performs design space exploration to generate a customized hardware architecture and software operator library that can be leveraged by mainstream learning frameworks. We demonstrate our approach by deploying optimized deep learning models used for object classification and style transfer on edge-class FPGAs.
△ Less
Submitted 22 April, 2019; v1 submitted 11 July, 2018;
originally announced July 2018.
-
Heterogeneous Bitwidth Binarization in Convolutional Neural Networks
Authors:
Josh Fromm,
Shwetak Patel,
Matthai Philipose
Abstract:
Recent work has shown that fast, compact low-bitwidth neural networks can be surprisingly accurate. These networks use homogeneous binarization: all parameters in each layer or (more commonly) the whole model have the same low bitwidth (e.g., 2 bits). However, modern hardware allows efficient designs where each arithmetic instruction can have a custom bitwidth, motivating heterogeneous binarizatio…
▽ More
Recent work has shown that fast, compact low-bitwidth neural networks can be surprisingly accurate. These networks use homogeneous binarization: all parameters in each layer or (more commonly) the whole model have the same low bitwidth (e.g., 2 bits). However, modern hardware allows efficient designs where each arithmetic instruction can have a custom bitwidth, motivating heterogeneous binarization, where every parameter in the network may have a different bitwidth. In this paper, we show that it is feasible and useful to select bitwidths at the parameter granularity during training. For instance a heterogeneously quantized version of modern networks such as AlexNet and MobileNet, with the right mix of 1-, 2- and 3-bit parameters that average to just 1.4 bits can equal the accuracy of homogeneous 2-bit versions of these networks. Further, we provide analyses to show that the heterogeneously binarized systems yield FPGA- and ASIC-based implementations that are correspondingly more efficient in both circuit area and energy efficiency than their homogeneous counterparts.
△ Less
Submitted 31 October, 2018; v1 submitted 25 May, 2018;
originally announced May 2018.
-
Precision Scaling of Neural Networks for Efficient Audio Processing
Authors:
Jong Hwan Ko,
Josh Fromm,
Matthai Philipose,
Ivan Tashev,
Shuayb Zarar
Abstract:
While deep neural networks have shown powerful performance in many audio applications, their large computation and memory demand has been a challenge for real-time processing. In this paper, we study the impact of scaling the precision of neural networks on the performance of two common audio processing tasks, namely, voice-activity detection and single-channel speech enhancement. We determine the…
▽ More
While deep neural networks have shown powerful performance in many audio applications, their large computation and memory demand has been a challenge for real-time processing. In this paper, we study the impact of scaling the precision of neural networks on the performance of two common audio processing tasks, namely, voice-activity detection and single-channel speech enhancement. We determine the optimal pair of weight/neuron bit precision by exploring its impact on both the performance and processing time. Through experiments conducted with real user data, we demonstrate that deep neural networks that use lower bit precision significantly reduce the processing time (up to 30x). However, their performance impact is low (< 3.14%) only in the case of classification tasks such as those present in voice activity detection.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
On Engineering and Emergence
Authors:
Jochen Fromm
Abstract:
The engineering and design of self-organizing systems with emergent properties is a long-standing problem in the field of complex and distributed systems, for example in the engineering of self-organizing Multi-Agent Systems. The problem of combining engineering with emergence - to find a simple rule for a complex pattern - equals the problem of science in general. Therefore the answers are simi…
▽ More
The engineering and design of self-organizing systems with emergent properties is a long-standing problem in the field of complex and distributed systems, for example in the engineering of self-organizing Multi-Agent Systems. The problem of combining engineering with emergence - to find a simple rule for a complex pattern - equals the problem of science in general. Therefore the answers are similar, and the scientific method is the general solution to the problem of engineering complex systems.
△ Less
Submitted 3 January, 2006;
originally announced January 2006.
-
Ten Questions about Emergence
Authors:
Jochen Fromm
Abstract:
Self-Organization is of growing importance for large distributed computing systems. In these systems, a central control and manual management is exceedingly difficult or even impossible. Emergence is widely recognized as the core principle behind self-organization. Therefore the idea to use both principles to control and organize large-scale distributed systems is very attractive and not so far…
▽ More
Self-Organization is of growing importance for large distributed computing systems. In these systems, a central control and manual management is exceedingly difficult or even impossible. Emergence is widely recognized as the core principle behind self-organization. Therefore the idea to use both principles to control and organize large-scale distributed systems is very attractive and not so far off.
Yet there are many open questions about emergence and self-organization, ranging from a clear definition and scientific understanding to the possible applications in engineering and technology, including the limitations of both concepts. Self-organizing systems with emergent properties are highly desirable, but also very challenging. We pose ten central questions about emergence, give preliminary answers, and identify four basic limits of self-organization: a size limit, a place limit, a complexity limit and finally a combinatorial limit.
△ Less
Submitted 27 September, 2005;
originally announced September 2005.
-
Types and Forms of Emergence
Authors:
Jochen Fromm
Abstract:
The knowledge of the different types of emergence is essential if we want to understand and master complex systems in science and engineering, respectively. This paper specifies a universal taxonomy and comprehensive classification of the major types and forms of emergence in Multi-Agent Systems, from simple types of intentional and predictable emergence in machines to more complex forms of weak…
▽ More
The knowledge of the different types of emergence is essential if we want to understand and master complex systems in science and engineering, respectively. This paper specifies a universal taxonomy and comprehensive classification of the major types and forms of emergence in Multi-Agent Systems, from simple types of intentional and predictable emergence in machines to more complex forms of weak, multiple and strong emergence.
△ Less
Submitted 13 June, 2005;
originally announced June 2005.
-
Extended Iterative Scheme for QCD: Three-point Vertices
Authors:
L. Driesen,
J. Fromm,
J. Kuhrs,
M. Stingl
Abstract:
In the framework of a generalized iterative scheme introduced previously to account for the non-analytic coupling dependence associated with the renormalization-group invariant mass scale Lambda, we establish the self-consistency equations of the extended Feynman rules (Lambda-modified vertices of zeroth perturbative order) for the three-gluon vertex, the two ghost vertices, and the two vertices…
▽ More
In the framework of a generalized iterative scheme introduced previously to account for the non-analytic coupling dependence associated with the renormalization-group invariant mass scale Lambda, we establish the self-consistency equations of the extended Feynman rules (Lambda-modified vertices of zeroth perturbative order) for the three-gluon vertex, the two ghost vertices, and the two vertices of massless quarks. Calculations are performed to one-loop-order, in Landau gauge, and at the lowest approximation level (r=1) of interest for QCD. We discuss the phenomenon of compensating poles inherent in these equations, by which the formalism automatically cancels unphysical poles on internal lines, and the role of composite-operator information in the form of equation-of-motion condensate conditions. The observed near decoupling of the four-gluon conditions permits a solution to the 2-and-3-point conditions within an effective one-parameter freedom. There exists a parameter range in which one solution has all vertex coefficients real, as required for a physical solution, and a narrower range in which the transverse-gluon and massless-quark propagators both exhibit complex-conjugate pole pairs.
△ Less
Submitted 25 August, 1998;
originally announced August 1998.