-
Curved Space-Filling Tiles Using Voronoi Decomposition with Line, and Curve Segments Closed Under Wallpaper Symmetries
Authors:
Haard Panchal,
Ergun Akleman,
Vinayak Krishnamurthy,
Tolga Talha Yildiz,
Varda Grover
Abstract:
In this paper, we present a new approach to obtain symmetric tiles with curved edges. Our approach is based on using higher-order Voronoi sites that are closed under wallpaper symmetries. The resulting Voronoi tessellations provide us with symmetric tiles with curved edges. We have developed a web application that provides real-time tile design. Our application can be found at https://voronoi.viz.…
▽ More
In this paper, we present a new approach to obtain symmetric tiles with curved edges. Our approach is based on using higher-order Voronoi sites that are closed under wallpaper symmetries. The resulting Voronoi tessellations provide us with symmetric tiles with curved edges. We have developed a web application that provides real-time tile design. Our application can be found at https://voronoi.viz.tamu.edu. One of our key findings in this paper is that not all symmetry operations are useful for creating curved tiles. In particular, all symmetries that use mirror operation produce straight lines that are useless for creating new tiles. This result is interesting because it suggests that we need to avoid mirror transformations to produce unusual space-filling tiles in 2D and 3D using Voronoi tessellations.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Axon: A Language for Dynamic Shapes in Deep Learning Graphs
Authors:
Alexander Collins,
Vinod Grover
Abstract:
Axon is a language that enables shape and rank inference for tensors in a Deep Learning graphs. It aims to make shapes implicit and inferred, in a similar manner to how types are implicit and inferred in many functional programming languages. Tensor dimensions are represented by expressions consisting of symbolic variables, constants, and arithmetic operators. Tensor shapes can be expressed as eit…
▽ More
Axon is a language that enables shape and rank inference for tensors in a Deep Learning graphs. It aims to make shapes implicit and inferred, in a similar manner to how types are implicit and inferred in many functional programming languages. Tensor dimensions are represented by expressions consisting of symbolic variables, constants, and arithmetic operators. Tensor shapes can be expressed as either a sequence of these dimension expressions, as a symbolic variable, or as an appending of other shapes. This allows complex constraints on shapes to be expressed. Axon is functional in style, with a type system similar in to Standard ML, extended to include shape information. It provides a suite of built in operators over tensors, including pointwise arithmetic operators, maps, reduction, loops and user defined functions. We describe a shape inference algorithm based on constraint solving which infers information about shapes, from both shape information provided by the programmer and the structure of the program. This allows fully automatic inference of the shapes of tensors for complex Deep Learning graphs. This approach reduces programmer effort when specifying graphs, as tensor shapes are not explicit, allows composition of Deep Learning graphs while maintaining input and output tensor shape compatibility, and aids in automated error detection by identifying shape mismatches at runtime.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Probabilistic Programming with CuPPL
Authors:
Alexander Collins,
Vinod Grover
Abstract:
Probabilistic Programming Languages (PPLs) are a powerful tool in machine learning, allowing highly expressive generative models to be expressed succinctly. They couple complex inference algorithms, implemented by the language, with an expressive modelling language that allows a user to implement any computable function as the generative model. Such languages are usually implemented on top of exis…
▽ More
Probabilistic Programming Languages (PPLs) are a powerful tool in machine learning, allowing highly expressive generative models to be expressed succinctly. They couple complex inference algorithms, implemented by the language, with an expressive modelling language that allows a user to implement any computable function as the generative model. Such languages are usually implemented on top of existing high level programming languages and do not make use of hardware accelerators. PPLs that do make use of accelerators exist, but restrict the expressivity of the language in order to do so. In this paper, we present a language and toolchain that generates highly efficient code for both CPUs and GPUs. The language is functional in style, and the tool chain is built on top of LLVM. Our implementation uses de-limited continuations on CPU to perform inference, and custom CUDA codes on GPU. We obtain significant speed ups across a suite of PPL workloads, compared to other state of the art approaches on CPU. Furthermore, our compiler can also generate efficient code that runs on CUDA GPUs.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Automatic Kernel Generation for Volta Tensor Cores
Authors:
Somashekaracharya G. Bhaskaracharya,
Julien Demouth,
Vinod Grover
Abstract:
A commonly occurring computation idiom in neural networks is to perform some pointwise operations on the result of a matrix multiplication. Such a sequence of operations is typically represented as a computation graph in deep learning compilers. When compiling to a GPU target, these computations can be individually mapped to manually tuned implementations provided by libraries such as cuBLAS and c…
▽ More
A commonly occurring computation idiom in neural networks is to perform some pointwise operations on the result of a matrix multiplication. Such a sequence of operations is typically represented as a computation graph in deep learning compilers. When compiling to a GPU target, these computations can be individually mapped to manually tuned implementations provided by libraries such as cuBLAS and cuDNN. These libraries also provide off-the-shelf support for targeting tensor cores in NVIDIA GPUs, which can lead to huge performance boosts through their specialized support for mixed-precision matrix math. Alternatively, tensor cores can be programmed directly using CUDA APIs or inline assembly instructions, which opens up the possibility of generating efficient CUDA kernels automatically for such computations.
Automatic kernel generation is particularly crucial when it is beneficial to generate efficient code for an entire computation graph by fusing several operations into a single device function instead of invoking a separate kernel for each of them. Polyhedral compilation techniques provide a systematic approach for the analysis and transformation of a sequence of affine loop-nests. In this paper, we describe a polyhedral approach to generate efficient CUDA kernels for matrix multiplication using inline assembly instructions for programming tensor cores on NVIDIA Volta GPUs. Furthermore, we build on this approach to generate fused kernels for computation sequences involving matrix multiplication and pointwise operations such as bias addition, ReLU activation etc. Experimental evaluation of these techniques show that automatically generated kernels can provide significantly better performance than manually tuned library implementations, with speedups ranging up to 2.55X.
△ Less
Submitted 1 August, 2020; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs
Authors:
Bastian Hagedorn,
Archibald Samuel Elliott,
Henrik Barthels,
Rastislav Bodik,
Vinod Grover
Abstract:
Achieving high-performance GPU kernels requires optimizing algorithm implementations to the targeted GPU architecture. It is of utmost importance to fully use the compute and memory hierarchy, as well as available specialised hardware. Currently, vendor libraries like cuBLAS and cuDNN provide the best performing implementations of GPU algorithms. However the task of the library programmer is incre…
▽ More
Achieving high-performance GPU kernels requires optimizing algorithm implementations to the targeted GPU architecture. It is of utmost importance to fully use the compute and memory hierarchy, as well as available specialised hardware. Currently, vendor libraries like cuBLAS and cuDNN provide the best performing implementations of GPU algorithms. However the task of the library programmer is incredibly challenging: for each provided algorithm, high-performance implementations have to be developed for all commonly used architectures, input sizes, and different storage formats. These implementations are generally provided as optimized assembly code because performance-critical architectural features are only exposed at this level. This prevents reuse between different implementations of even the same algorithm, as simple differences can have major effects on low-level implementation details. In this paper we introduce Fireiron, a DSL and compiler which allows the specification of high-performance GPU implementations as compositions of simple and reusable building blocks. We show how to use Fireiron to optimize matrix multiplication implementations, achieving performance matching hand-coded CUDA kernels, even when using specialised hardware such as NIVIDA Tensor Cores, and outperforming state-of-the-art implementations provided by cuBLAS by more than 2x.
△ Less
Submitted 13 March, 2020;
originally announced March 2020.
-
Radiation tolerance: Nano triumphs bulk
Authors:
Parswajit Kalita,
Santanu Ghosh,
Gaëlle Gutierrez,
Parasmani Rajput,
Vinita Grover,
Gaël Sattonnay,
Devesh K. Avasthi
Abstract:
Materials are subjected to energetic particles in a number of radiation environments,and are hence prone to undesirable (radiation) damage.We report here the superiority of the nanocrystalline phase over bulk for radiation tolerance under simultaneous irradiation with high energy (electronic energy loss (Se) dominant) and low energy (nuclear energy loss (Sn) dominant) particles.Nano-crystalline yt…
▽ More
Materials are subjected to energetic particles in a number of radiation environments,and are hence prone to undesirable (radiation) damage.We report here the superiority of the nanocrystalline phase over bulk for radiation tolerance under simultaneous irradiation with high energy (electronic energy loss (Se) dominant) and low energy (nuclear energy loss (Sn) dominant) particles.Nano-crystalline yttria stabilized zirconia is found to exhibit lesser radiation damage (viz.degradation in crystallinity),when compared to its bulklike counterpart,against simultaneous irradiation with high energy 27 MeV Fe and low energy 900 keV I ions.This is interpreted within the framework of the thermal spike model after considering (i) the fact that there is essentially no spatial and time overlap between the damage events of the two simultaneous ion beams,and (ii) the influence of grain size on the radiation damage against separate Sn and Se.The present work besides being of keen interest for fundamental understanding of ion material interactions,also paves the way for the potential application of nanocrystalline materials in the nuclear industry where such simultaneous irradiations are encountered.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
Automatic acceleration of Numpy applications on GPUs and multicore CPUs
Authors:
Mahesh Ravishankar,
Vinod Grover
Abstract:
Frameworks like Numpy are a popular choice for application developers from varied fields such as image processing to bio-informatics to machine learning. Numpy is often used to develop prototypes or for deployment since it provides efficient implementation for operations involving arrays. Such an approach requires every operation to be executed eagerly. The result of each operation needs to be sto…
▽ More
Frameworks like Numpy are a popular choice for application developers from varied fields such as image processing to bio-informatics to machine learning. Numpy is often used to develop prototypes or for deployment since it provides efficient implementation for operations involving arrays. Such an approach requires every operation to be executed eagerly. The result of each operation needs to be stored in memory which increases the memory footprint of the application. It also increases the bandwidth requirements since all uses must read from this memory. We propose an approach that records the sequence of Numpy operations for defered execution. When the values of an array are needed, for example when the values are stored to disk or displayed on screen, the sequence of operations required to compute these value are compiled into a function and executed. This removes the need to store/load intermediates in slow memory, resulting in better performance. In cases where the library implementation is more efficient (like matrix-matrix multiply), those are used instead. The approach also allows us to seamlessly target both multicore CPUs and NVIDIA GPUs, thereby porting the Numpy application to these architectures without changing the user program. The benefit of the approach is evaluated by targeting computation samples from various domains and on average on order of magnitude performance improvement over Numpy is observed.
△ Less
Submitted 11 January, 2019;
originally announced January 2019.
-
Enhanced radiation tolerance of YSZ at high temperature against swift heavy ions: key role of interplay between material microstructure and irradiation temperature
Authors:
Parswajit Kalita,
Santanu Ghosh,
Udai B. Singh,
Pawan K. Kulriya,
Vinita Grover,
Rakesh Shukla,
A. K. Tyagi,
Gael Sattonnay,
Devesh K. Avasthi
Abstract:
Yttria stabilized Zirconia (YSZ) pellets with different crystallite sizes were irradiated with 80 MeV Ag$^{6+}$ ions at room temperature and 1000 K to understand the role of crystallite size/material microstructure and irradiation temperature on the radiation tolerance against high electronic energy loss (S$_e$). X-ray diffraction and Raman spectroscopy measurements reveal that, irrespective of th…
▽ More
Yttria stabilized Zirconia (YSZ) pellets with different crystallite sizes were irradiated with 80 MeV Ag$^{6+}$ ions at room temperature and 1000 K to understand the role of crystallite size/material microstructure and irradiation temperature on the radiation tolerance against high electronic energy loss (S$_e$). X-ray diffraction and Raman spectroscopy measurements reveal that, irrespective of the irradiation temperature, the nano-crystalline samples suffered more damage as compared to the bulk-like sample. A reduction in the irradiation damage i.e. improvement in the radiation tolerance, was observed for all the samples irradiated at 1000 K. The reduction in the damage, however, was remarkably higher for the two nano-crystalline samples compared to the bulk-like sample, and hence the difference in the damage between the bulk-like and nano-crystalline samples was also significantly lower at 1000 K than that at room temperature. The irradiation damage, against S$_e$, was thus found to be critically dependent on the interplay between the irradiation temperature and crystallite size. These results are explained with the help of detailed theoretical calculations/simulations based on the 'in-elastic thermal spike' model by taking into consideration the combined effect of crystallite size and environmental (irradiation) temperature on the electron-phonon coupling factor and lattice thermal conductivity (and hence on the resulting thermal spike). Our results are crucial from the fundamental perspective of comprehending the size and temperature dependent radiation damage against S$_e$ ; and also for a number of applications, in various radiation environments, where nano-materials are being envisioned for use.
△ Less
Submitted 27 July, 2018; v1 submitted 26 June, 2018;
originally announced June 2018.
-
Geometrically frustrated GdInO$_3$: An exotic system to study negative thermal expansion and spin-lattice coupling
Authors:
Barnita Paul,
Swastika Chatterjee,
Anushree Roy,
A. Midya,
P. Mandal,
Vinita Grover,
A. K. Tyagi
Abstract:
In this article, we report negative thermal expansion and spin frustration in hexagonal GdInO$_{3}$. Rietveld refinement of the XRD patterns reveal that the negative thermal expansion in the temperature range of 50-100K stems from the triangular lattice of Gd$^{3+}$ ions. At low temperature, the downward deviation of the inverse susceptibility ($χ^{-1}$) vs. $T$ plot from the Curie-Weiss law indic…
▽ More
In this article, we report negative thermal expansion and spin frustration in hexagonal GdInO$_{3}$. Rietveld refinement of the XRD patterns reveal that the negative thermal expansion in the temperature range of 50-100K stems from the triangular lattice of Gd$^{3+}$ ions. At low temperature, the downward deviation of the inverse susceptibility ($χ^{-1}$) vs. $T$ plot from the Curie-Weiss law indicates spin frustration which inhibits long-range magnetic ordering down to 2K. Magnetostriction measurements clearly demonstrate a strong spin-lattice coupling. Low temperature anomalous phonon softening, as obtained from temperature dependent Raman measurements, also reveals the same. Our experimental observations are supported by first principles density functional theory calculations of the electronic and phonon dispersion of GdInO$_3$. The calculations suggest that the GdInO$_3$ lattice is highly frustrated at low temperature. Further, the calculated normal mode frequencies of the Gd related $Γ$ point phonons are found to depend on the magnetic structure of the lattice, suggesting significant magneto-elastic coupling.
△ Less
Submitted 20 August, 2016;
originally announced August 2016.
-
High-pressure x-ray diffraction study of bulk and nanocrystalline PbMoO4
Authors:
D. Errandonea,
D. Santamaria-Perez,
V. Grover,
S. N. Achary,
A. K. Tyagi
Abstract:
We studied the effects of high-pressure on the crystalline structure of bulk and nanocrystalline scheelite-type PbMoO4. We found that in both cases the compressibility of the materials is highly non-isotropic, being the c-axis the most compressible one. We also observed that the volume compressibility of nanocrystals becomes higher that the bulk one at 5 GPa. In addition, at 10.7(8) GPa we observe…
▽ More
We studied the effects of high-pressure on the crystalline structure of bulk and nanocrystalline scheelite-type PbMoO4. We found that in both cases the compressibility of the materials is highly non-isotropic, being the c-axis the most compressible one. We also observed that the volume compressibility of nanocrystals becomes higher that the bulk one at 5 GPa. In addition, at 10.7(8) GPa we observed the onset of an structural phase transition in bulk PbMoO4. The high-pressure phase has a monoclinic structure similar to M-fergusonite. The transition is reversible and not volume change is detected between the low- and high-pressure phases. No additional structural changes or evidence of decomposition are found up to 21.1 GPa. In contrast nanocrystalline PbMoO4 remains in the scheelite structure at least up to 16.1 GPa. Finally, the equation of state for bulk and nanocrystalline PbMoO4 are also determined.
△ Less
Submitted 11 October, 2010;
originally announced October 2010.