-
Extending Sparse Tensor Accelerators to Support Multiple Compression Formats
Authors:
Eric Qin,
Geonhwa Jeong,
William Won,
Sheng-Chun Kao,
Hyoukjun Kwon,
Sudarshan Srinivasan,
Dipankar Das,
Gordon E. Moon,
Sivasankaran Rajamanickam,
Tushar Krishna
Abstract:
Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings. These applications use data stored in a variety of compression formats. We demonstrate that both the compactness of different compression formats and the compute efficiency of the algorithms enab…
▽ More
Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings. These applications use data stored in a variety of compression formats. We demonstrate that both the compactness of different compression formats and the compute efficiency of the algorithms enabled by them vary across tensor dimensions and amount of sparsity. Since DL and scientific workloads span across all sparsity regions, there can be numerous format combinations for optimizing memory and compute efficiency. Unfortunately, many proposed accelerators operate on one or two fixed format combinations. This work proposes hardware extensions to accelerators for supporting numerous format combinations seamlessly and demonstrates ~4X speedup over performing format conversions in software.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
RP-VIO: Robust Plane-based Visual-Inertial Odometry for Dynamic Environments
Authors:
Karnik Ram,
Chaitanya Kharyal,
Sudarshan S. Harithas,
K. Madhava Krishna
Abstract:
Modern visual-inertial navigation systems (VINS) are faced with a critical challenge in real-world deployment: they need to operate reliably and robustly in highly dynamic environments. Current best solutions merely filter dynamic objects as outliers based on the semantics of the object category. Such an approach does not scale as it requires semantic classifiers to encompass all possibly-moving o…
▽ More
Modern visual-inertial navigation systems (VINS) are faced with a critical challenge in real-world deployment: they need to operate reliably and robustly in highly dynamic environments. Current best solutions merely filter dynamic objects as outliers based on the semantics of the object category. Such an approach does not scale as it requires semantic classifiers to encompass all possibly-moving object classes; this is hard to define, let alone deploy. On the other hand, many real-world environments exhibit strong structural regularities in the form of planes such as walls and ground surfaces, which are also crucially static. We present RP-VIO, a monocular visual-inertial odometry system that leverages the simple geometry of these planes for improved robustness and accuracy in challenging dynamic environments. Since existing datasets have a limited number of dynamic elements, we also present a highly-dynamic, photorealistic synthetic dataset for a more effective evaluation of the capabilities of modern VINS systems. We evaluate our approach on this dataset, and three diverse sequences from standard datasets including two real-world dynamic sequences and show a significant improvement in robustness and accuracy over a state-of-the-art monocular visual-inertial odometry system. We also show in simulation an improvement over a simple dynamic-features masking approach. Our code and dataset are publicly available.
△ Less
Submitted 5 December, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations
Authors:
Neil Jethani,
Mukund Sudarshan,
Yindalon Aphinyanaphongs,
Rajesh Ranganath
Abstract:
While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as ev…
▽ More
While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as evaluated by a predictor model for the target. Popular methods learn the selector and predictor model in concert, which we show allows predictions to be encoded within interpretations. We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method, which learn a predictor model that approximates the true data generating distribution given any subset of the input. We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
The Sensitivity of Word Embeddings-based Author Detection Models to Semantic-preserving Adversarial Perturbations
Authors:
Jeremiah Duncan,
Fabian Fallas,
Chris Gropp,
Emily Herron,
Maria Mahbub,
Paula Olaya,
Eduardo Ponce,
Tabitha K. Samuel,
Daniel Schultz,
Sudarshan Srinivasan,
Maofeng Tang,
Viktor Zenkov,
Quan Zhou,
Edmon Begoli
Abstract:
Authorship analysis is an important subject in the field of natural language processing. It allows the detection of the most likely writer of articles, news, books, or messages. This technique has multiple uses in tasks related to authorship attribution, detection of plagiarism, style analysis, sources of misinformation, etc. The focus of this paper is to explore the limitations and sensitiveness…
▽ More
Authorship analysis is an important subject in the field of natural language processing. It allows the detection of the most likely writer of articles, news, books, or messages. This technique has multiple uses in tasks related to authorship attribution, detection of plagiarism, style analysis, sources of misinformation, etc. The focus of this paper is to explore the limitations and sensitiveness of established approaches to adversarial manipulations of inputs. To this end, and using those established techniques, we first developed an experimental frame-work for author detection and input perturbations. Next, we experimentally evaluated the performance of the authorship detection model to a collection of semantic-preserving adversarial perturbations of input narratives. Finally, we compare and analyze the effects of different perturbation strategies, input and model configurations, and the effects of these on the author detection model.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
A multiscale model of vascular function in chronic thromboembolic pulmonary hypertension
Authors:
Mitchel J. Colebank,
M. Umar Qureshi,
Sudarshan Rajagopal,
Richard A. Krasuski,
Mette S. Olufsen
Abstract:
Chronic thromboembolic pulmonary hypertension (CTEPH) is caused by recurrent or unresolved pulmonary thromboemboli, leading to perfusion defects and increased arterial wave reflections. CTEPH treatment aims to reduce pulmonary arterial pressure and reestablish adequate lung perfusion, yet patients with distal lesions are inoperable by standard surgical intervention. Instead, these patients undergo…
▽ More
Chronic thromboembolic pulmonary hypertension (CTEPH) is caused by recurrent or unresolved pulmonary thromboemboli, leading to perfusion defects and increased arterial wave reflections. CTEPH treatment aims to reduce pulmonary arterial pressure and reestablish adequate lung perfusion, yet patients with distal lesions are inoperable by standard surgical intervention. Instead, these patients undergo balloon pulmonary angioplasty (BPA), a multi-session, minimally invasive surgery that disrupts the thromboembolic material within the vessel lumen using a catheter balloon. However, there still lacks an integrative, holistic tool for identifying optimal target lesions for treatment. To address this insufficiency, we simulate CTEPH hemodynamics and BPA therapy using a multiscale fluid dynamics model. The large pulmonary arterial geometry is derived from a computed tomography (CT) image, whereas a fractal tree represents the small vessels. We model ring- and web-like lesions, common in CTEPH, and simulate normotensive conditions and four CTEPH disease scenarios; the latter includes both large artery lesions and vascular remodeling. BPA therapy is simulated by simultaneously reducing lesion severity in three locations. Our predictions mimic severe CTEPH, manifested by an increase in mean proximal pulmonary arterial pressure above 20 mmHg and prominent wave reflections. Both flow and pressure decrease in vessels distal to the lesions and increase in unobstructed vascular regions. We use the main pulmonary artery (MPA) pressure, a wave reflection index, and a measure of flow heterogeneity to select optimal target lesions for BPA. In summary, this study provides a multiscale, image-to-hemodynamics pipeline for BPA therapy planning for inoperable CTEPH patients.
△ Less
Submitted 1 June, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Eight fold quantum Hall phases in a time reversal symmetry broken tight binding model
Authors:
Sudarshan Saha,
Tanay Nag,
Saptarshi Mandal
Abstract:
We consider a time reversal symmetry (TRS) broken Kane-Mele model superimposed with Haldane model and chart out the phase diagram using spin Chern number to investigate the fate of quantum anomalous Hall insulator (QAHI) and quantum spin Hall insulator (QSHI) phases. Interestingly, in addition to QSHI and QAHI phase, the phase diagram unveils quantum anomalous spin Hall insulator (QASHI) phase whe…
▽ More
We consider a time reversal symmetry (TRS) broken Kane-Mele model superimposed with Haldane model and chart out the phase diagram using spin Chern number to investigate the fate of quantum anomalous Hall insulator (QAHI) and quantum spin Hall insulator (QSHI) phases. Interestingly, in addition to QSHI and QAHI phase, the phase diagram unveils quantum anomalous spin Hall insulator (QASHI) phase where only one spin sector is topological. We also find multicritical points where three / four topological phase boundaries coalesce. These topological phases are protected by an effective TRS and a composite anti-unitary particle-hole symmetry leading to remarkable properties of edge modes. We find spin-selective, spin-polarized and spin-neutral edge transport in QASHI, QSHI and QAHI phases respectively. Our study indicates that the robustness of the topological phase mainly depends on the spin gap which does not necessarily vanish at the Dirac points across a topological phase transition. We believe that our proposals can be tested in near future using recent experimental advancements in solid state and cold atomic systems.
△ Less
Submitted 26 June, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Point absorbers in Advanced LIGO
Authors:
Aidan F. Brooks,
Gabriele Vajente,
Hiro Yamamoto,
Rich Abbott,
Carl Adams,
Rana X. Adhikari,
Alena Ananyeva,
Stephen Appert,
Koji Arai,
Joseph S. Areeda,
Yasmeen Asali,
Stuart M. Aston,
Corey Austin,
Anne M. Baer,
Matthew Ball,
Stefan W. Ballmer,
Sharan Banagiri,
David Barker,
Lisa Barsotti,
Jeffrey Bartlett,
Beverly K. Berger,
Joseph Betzwieser,
Dripta Bhattacharjee,
Garilynn Billingsley,
Sebastien Biscans
, et al. (176 additional authors not shown)
Abstract:
Small, highly absorbing points are randomly present on the surfaces of the main interferometer optics in Advanced LIGO. The resulting nano-meter scale thermo-elastic deformations and substrate lenses from these micron-scale absorbers significantly reduces the sensitivity of the interferometer directly though a reduction in the power-recycling gain and indirect interactions with the feedback contro…
▽ More
Small, highly absorbing points are randomly present on the surfaces of the main interferometer optics in Advanced LIGO. The resulting nano-meter scale thermo-elastic deformations and substrate lenses from these micron-scale absorbers significantly reduces the sensitivity of the interferometer directly though a reduction in the power-recycling gain and indirect interactions with the feedback control system. We review the expected surface deformation from point absorbers and provide a pedagogical description of the impact on power build-up in second generation gravitational wave detectors (dual-recycled Fabry-Perot Michelson interferometers). This analysis predicts that the power-dependent reduction in interferometer performance will significantly degrade maximum stored power by up to 50% and hence, limit GW sensitivity, but suggests system wide corrections that can be implemented in current and future GW detectors. This is particularly pressing given that future GW detectors call for an order of magnitude more stored power than currently used in Advanced LIGO in Observing Run 3. We briefly review strategies to mitigate the effects of point absorbers in current and future GW wave detectors to maximize the success of these enterprises.
△ Less
Submitted 25 March, 2021; v1 submitted 14 January, 2021;
originally announced January 2021.
-
Latent Alignment of Procedural Concepts in Multimodal Recipes
Authors:
Hossein Rajaby Faghihi,
Roshanak Mirzaee,
Sudarshan Paliwal,
Parisa Kordjamshidi
Abstract:
We propose a novel alignment mechanism to deal with procedural reasoning on a newly released multimodal QA dataset, named RecipeQA. Our model is solving the textual cloze task which is a reading comprehension on a recipe containing images and instructions. We exploit the power of attention networks, cross-modal representations, and a latent alignment space between instructions and candidate answer…
▽ More
We propose a novel alignment mechanism to deal with procedural reasoning on a newly released multimodal QA dataset, named RecipeQA. Our model is solving the textual cloze task which is a reading comprehension on a recipe containing images and instructions. We exploit the power of attention networks, cross-modal representations, and a latent alignment space between instructions and candidate answers to solve the problem. We introduce constrained max-pooling which refines the max-pooling operation on the alignment matrix to impose disjoint constraints among the outputs of the model. Our evaluation result indicates a 19\% improvement over the baselines.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Disease contagion models coupled to crowd motion and mesh-free simulation
Authors:
Parveena Samim Abdul Salam,
Wolfgang Bock,
Axel Klar,
Sudarshan Tiwari
Abstract:
Modeling and simulation of disease spreading in pedestrian crowds has been recently become a topic of increasing relevance. In this paper, we consider the influence of the crowd motion in a complex dynamical environment on the course of infection of the pedestrians. To model the pedestrian dynamics we consider a kinetic equation for multi-group pedestrian flow based on a social force model coupled…
▽ More
Modeling and simulation of disease spreading in pedestrian crowds has been recently become a topic of increasing relevance. In this paper, we consider the influence of the crowd motion in a complex dynamical environment on the course of infection of the pedestrians. To model the pedestrian dynamics we consider a kinetic equation for multi-group pedestrian flow based on a social force model coupled with an Eikonal equation. This model is coupled with a non-local SEIS contagion model for disease spread, where besides the description of local contacts also the influence of contact times has been modelled. Hydrodynamic approximations of the coupled system are derived. Finally, simulations of the hydrodynamic model are carried out using a mesh-free particle method. Different numerical test cases are investigated including uni- and bi-directional flow in a passage with and without obstacles.
△ Less
Submitted 5 January, 2021;
originally announced January 2021.
-
BMS algebra from residual gauge invariance in light-cone gravity
Authors:
Sudarshan Ananth,
Lars Brink,
Sucheta Majumdar
Abstract:
We analyze the residual gauge freedom in gravity, in four dimensions, in the light-cone gauge, in a formulation where unphysical fields are integrated out. By checking the invariance of the light-cone Hamiltonian, we obtain a set of residual gauge transformations, which satisfy the BMS algebra realized on the two physical fields in the theory. Hence, the BMS algebra appears as a consequence of res…
▽ More
We analyze the residual gauge freedom in gravity, in four dimensions, in the light-cone gauge, in a formulation where unphysical fields are integrated out. By checking the invariance of the light-cone Hamiltonian, we obtain a set of residual gauge transformations, which satisfy the BMS algebra realized on the two physical fields in the theory. Hence, the BMS algebra appears as a consequence of residual gauge invariance in the bulk and not just at the asymptotic boundary. We highlight the key features of the light-cone BMS algebra and discuss its connection with the quadratic form structure of the Hamiltonian.
△ Less
Submitted 9 November, 2021; v1 submitted 31 December, 2020;
originally announced January 2021.
-
Base Station Coordination Scheme for Multi-tier Ultra-dense Networks
Authors:
Sudarshan Mukherjee,
Dongsun Kim,
Jemin Lee
Abstract:
In this paper, we consider a relative received link power (RRLP)-based coordinated multi-point (CoMP) joint transmission (JT) in the multi-tier ultra-dense networks (UDN). In this CoMP scheme, we identify the cooperating base stations (BSs) by comparing the average received link power (ARLP) of the neighbouring BSs with respect to the BS having the strongest ARLP (i.e., the main link BS) to a user…
▽ More
In this paper, we consider a relative received link power (RRLP)-based coordinated multi-point (CoMP) joint transmission (JT) in the multi-tier ultra-dense networks (UDN). In this CoMP scheme, we identify the cooperating base stations (BSs) by comparing the average received link power (ARLP) of the neighbouring BSs with respect to the BS having the strongest ARLP (i.e., the main link BS) to a user. To analyze the performance of this CoMP scheme in the downlink multi-tier UDN, we first approximate the received signal power distribution, and derive the coverage probability using stochastic geometry. After revisiting the area spectral efficiency (ASE) to make it more suitable for CoMP transmission in UDN, we also analyze the ASE and the network energy efficiency (NEE). Using simulations, we validate the derived coverage probability, and investigate the CoMP performance in multi-tier UDN. Our simulations show that the RRLP-based CoMP scheme can outperform the fixed number of strongest BS-based CoMP scheme in the high BS density regime. Our study of the NEE performance reveals that not only the RRLP-based CoMP scheme is more efficient than conventional non-CoMP transmission scenario, but also its NEE performance improves with the average number of cooperating BSs.
△ Less
Submitted 18 December, 2020;
originally announced December 2020.
-
BMS algebra as an extension of the Poincaré symmetry in light-cone gravity
Authors:
Sudarshan Ananth,
Lars Brink,
Sucheta Majumdar
Abstract:
We analyze possible local extensions of the Poincaré symmetry in light-cone gravity in four dimensions. We use a formalism where we represent the algebra on the two physical degrees of freedom, one with helicity $2$ and the other with helicity $-2$. The representation is non-linearly realized and one of the light-cone momenta is the Hamiltonian, which is hence a non-linear generator of the algebra…
▽ More
We analyze possible local extensions of the Poincaré symmetry in light-cone gravity in four dimensions. We use a formalism where we represent the algebra on the two physical degrees of freedom, one with helicity $2$ and the other with helicity $-2$. The representation is non-linearly realized and one of the light-cone momenta is the Hamiltonian, which is hence a non-linear generator of the algebra. We find that this can be locally realized and the Poincaré algebra extended to the BMS symmetry without any reference to asymptotic limits.
△ Less
Submitted 20 July, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Performance Analysis of Cell Free Massive MIMO systems in LoS/ NLoS Channels
Authors:
Sudarshan Mukherjee,
Ribhu Chopra
Abstract:
In cellular communication systems, it is conventional to assume the absence of a line of sight (LoS) path between the users and their associated access points (APs). This assumption however becomes questionable in the context of recent developments in the direction of cell free (CF) massive MIMO systems. In the CF massive MIMO, the AP density is assumed to be comparable with the user density, whic…
▽ More
In cellular communication systems, it is conventional to assume the absence of a line of sight (LoS) path between the users and their associated access points (APs). This assumption however becomes questionable in the context of recent developments in the direction of cell free (CF) massive MIMO systems. In the CF massive MIMO, the AP density is assumed to be comparable with the user density, which increases probability of existence of an LoS path between the users and their associated APs. In this paper, we analyze the performance of an uplink CF massive MIMO system, with a probabilistic LoS channel model. Here, we first derive the effective statistics of this channel model, and argue that their behaviour is fundamentally different from that of the conventional rich scattering channels. Utilizing these statistics, we next compare the rates achievable by CF massive MIMO systems, under both stream-wise and joint decoding at the central processing unit. Following this, we also discuss the centralized MMSE based data detection to obtain a complexity/ performance trade-off. Finally, using detailed Monte-Carlo simulations, we validate our analytical results, and evaluate the performance of the three data detection schemes.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
Fully gapped superconductivity in centrosymmetric and non-centrosymmetric Re-B compounds probed with $μ$SR
Authors:
S. Sharma,
Arushi,
K. Motla,
J. Beare,
M. Nugent,
M. Pula,
T. J. Munsie,
A. D. Hillier,
R. P. Singh,
G. M. Luke
Abstract:
We present a comprehensive study on superconducting properties of Re$_7$B$_3$ and Re$_3$B through specific heat, magnetic susceptibility, resistivity, and transverse and zero-field muon spin rotation/relaxation ($μ$SR) experiments on polycrystalline samples. Re$_7$B$_3$ (T$_C$ = 3.2~K) is a non-centrosymmetric type-II ($κ$ $\approx$ 9.27) superconductor in the weak coupling ($λ_{e-ph}$ = 0.54) reg…
▽ More
We present a comprehensive study on superconducting properties of Re$_7$B$_3$ and Re$_3$B through specific heat, magnetic susceptibility, resistivity, and transverse and zero-field muon spin rotation/relaxation ($μ$SR) experiments on polycrystalline samples. Re$_7$B$_3$ (T$_C$ = 3.2~K) is a non-centrosymmetric type-II ($κ$ $\approx$ 9.27) superconductor in the weak coupling ($λ_{e-ph}$ = 0.54) regime. On the other hand, Re$_3$B (T$_C$ = 5.19~K) is a centrosymmetric type-II ($κ$ $\approx$ 34.55) superconductor in the moderate coupling ($λ_{e-ph}$ = 0.64) regime. Our transverse-field $μ$SR measurements show evidence for isotropically gapped BCS type superconductivity with normalized gap ($Δ_0/k_BT_C$) values of 1.69 (Re$_7$B$_3$) and 1.75 (Re$_3$B).
△ Less
Submitted 26 February, 2021; v1 submitted 26 November, 2020;
originally announced November 2020.
-
Unconventional Hall effect and its variation with Co-do** in van der Waals Fe3GeTe2
Authors:
Rajeswari Roy Chowdhury,
Samik DuttaGupta,
Chandan Patra,
Oleg A. Tretiakov,
Sudarshan Sharma,
Shunsuke Fukami,
Hideo Ohno,
Ravi Prakash Singh
Abstract:
Two-dimensional (2D) van der Waals (vdW) magnetic materials have attracted a lot of attention owing to the stabilization of long-range magnetic order down to atomic dimensions, and the prospect of novel spintronic devices with unique functionalities. The clarification of the magnetoresistive properties and its correlation to the underlying magnetic configurations is essential for 2D vdW-based spin…
▽ More
Two-dimensional (2D) van der Waals (vdW) magnetic materials have attracted a lot of attention owing to the stabilization of long-range magnetic order down to atomic dimensions, and the prospect of novel spintronic devices with unique functionalities. The clarification of the magnetoresistive properties and its correlation to the underlying magnetic configurations is essential for 2D vdW-based spintronic devices. Here, the effect of Co-do** on the magnetic and magnetotransport properties of Fe3GeTe2 have been investigated. Magnetotransport measurements reveal an unusual Hall effect behavior whose strength was considerably modified by Co-do** and attributed to arise from the underlying complicated spin textures. The present results provide a clue to tailoring of the underlying interactions necessary for the realization of a variety of unconventional spin textures for 2D vdW FM-based spintronics.
△ Less
Submitted 24 March, 2021; v1 submitted 23 November, 2020;
originally announced November 2020.
-
The BayesWave analysis pipeline in the era of gravitational wave observations
Authors:
Neil J. Cornish,
Tyson B. Littenberg,
Bence Bécsy,
Katerina Chatziioannou,
James A. Clark,
Sudarshan Ghonge,
Margaret Millhouse
Abstract:
We describe updates and improvements to the BayesWave gravitational wave transient analysis pipeline, and provide examples of how the algorithm is used to analyze data from ground-based gravitational wave detectors. BayesWave models gravitational wave signals in a morphology-independent manner through a sum of frame functions, such as Morlet-Gabor wavelets or chirplets. BayesWave models the instru…
▽ More
We describe updates and improvements to the BayesWave gravitational wave transient analysis pipeline, and provide examples of how the algorithm is used to analyze data from ground-based gravitational wave detectors. BayesWave models gravitational wave signals in a morphology-independent manner through a sum of frame functions, such as Morlet-Gabor wavelets or chirplets. BayesWave models the instrument noise using a combination of a parametrized Gaussian noise component and non-stationary and non-Gaussian noise transients. Both the signal model and noise model employ trans-dimensional sampling, with the complexity of the model adapting to the requirements of the data. The flexibility of the algorithm makes it suitable for a variety of analyses, including reconstructing generic unmodeled signals; cross checks against modeled analyses for compact binaries; as well as separating coherent signals from incoherent instrumental noise transients (glitches). The BayesWave model has been extended to account for gravitational wave signals with generic polarization content and the simultaneous presence of signals and glitches in the data. We describe updates in the BayesWave prior distributions, sampling proposals, and burn-in stage that provide significantly improved sampling efficiency. We present standard review checks indicating the robustness and convergence of the BayesWave trans-dimensional sampler.
△ Less
Submitted 14 January, 2021; v1 submitted 18 November, 2020;
originally announced November 2020.
-
Centrality Measures in Complex Networks: A Survey
Authors:
Akrati Saxena,
Sudarshan Iyengar
Abstract:
In complex networks, each node has some unique characteristics that define the importance of the node based on the given application-specific context. These characteristics can be identified using various centrality metrics defined in the literature. Some of these centrality measures can be computed using local information of the node, such as degree centrality and semi-local centrality measure. O…
▽ More
In complex networks, each node has some unique characteristics that define the importance of the node based on the given application-specific context. These characteristics can be identified using various centrality metrics defined in the literature. Some of these centrality measures can be computed using local information of the node, such as degree centrality and semi-local centrality measure. Others use global information of the network like closeness centrality, betweenness centrality, eigenvector centrality, Katz centrality, PageRank, and so on. In this survey, we discuss these centrality measures and the state of the art literature that includes the extension of centrality measures to different types of networks, methods to update centrality values in dynamic networks, methods to identify top-k nodes, approximation algorithms, open research problems related to the domain, and so on. The paper is concluded with a discussion on application specific centrality measures that will help to choose a centrality measure based on the network type and application requirements.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.
-
Exosphere -- Bringing The Cloud Closer
Authors:
Julian L. Pistorius,
Chris Martin,
Sanjana Sudarshan,
David S. LeBauer
Abstract:
Exosphere provides researcher-friendly software for managing computing workloads on OpenStack cloud infrastructure. Exosphere is a user-friendly alternative to Horizon, the default OpenStack graphical interface. Exosphere can be used with most research cloud infrastructure, requiring near-zero custom integration work.
Exosphere provides researcher-friendly software for managing computing workloads on OpenStack cloud infrastructure. Exosphere is a user-friendly alternative to Horizon, the default OpenStack graphical interface. Exosphere can be used with most research cloud infrastructure, requiring near-zero custom integration work.
△ Less
Submitted 13 October, 2020; v1 submitted 24 August, 2020;
originally announced August 2020.
-
Materials preparation, single crystal growth, and the phase diagram of the cuprate high temperature superconductor La1.6-xNd0.4SrxCuO4
Authors:
Mirela Dragomir,
Qianli Ma,
J. Patrick Clancy,
Amirreza Ataei,
Paul A. Dube,
Sudarshan Sharma,
Ashfia Huq,
Hanna A. Dabkowska,
Louis Taillefer,
Bruce D. Gaulin
Abstract:
One branch of the La-214 family of cuprate superconductors, La1.6-xNd0.4SrxCuO4 (Nd-LSCO), has been of significant and sustained interest, in large part because it displays the full complexity of the phase diagram for canonical hole-doped, high Tc superconductivity, while also displaying relatively low superconducting critical temperatures. The low superconducting Tc's imply that experimentally ac…
▽ More
One branch of the La-214 family of cuprate superconductors, La1.6-xNd0.4SrxCuO4 (Nd-LSCO), has been of significant and sustained interest, in large part because it displays the full complexity of the phase diagram for canonical hole-doped, high Tc superconductivity, while also displaying relatively low superconducting critical temperatures. The low superconducting Tc's imply that experimentally accessible magnetic fields can suppress the superconductivity to zero temperature. In particular, this has enabled various transport and thermodynamic studies of the T = 0 ground state in Nd-LSCO, free of superconductivity, across the critical do** p* = 0.23 where the pseudogap phase ends. The strong dependence of its superconducting properties on its crystal symmetry has itself motivated careful studies of the Nd-LSCO structural phase diagram. This paper provides a systematic study and summary of the materials preparation and characterization of both single crystal and polycrystalline samples of Nd-LSCO. Single-phase polycrystalline samples with x spanning the range from 0.01 to 0.40 have been synthesized, and large single crystals of Nd-LSCO for select x across the region (0.07, 0.12, 0.17, 0.19, 0.225, 0.24, and 0.26) were grown by the optical floating zone method. Systematic neutron and X-ray diffraction studies on these samples were performed at both low and room temperatures, 10 K and 300 K, respectively. These studies allowed us to follow the various structural phase transitions and propose an updated structural phase diagram for Nd-LSCO. In particular, we found that the low-temperature tetragonal (LTT) phase ends at a critical do** pLTT = 0.255(5), clearly separated from p*.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Deep Direct Likelihood Knockoffs
Authors:
Mukund Sudarshan,
Wesley Tansey,
Rajesh Ranganath
Abstract:
Predictive modeling often uses black box machine learning methods, such as deep neural networks, to achieve state-of-the-art performance. In scientific domains, the scientist often wishes to discover which features are actually important for making the predictions. These discoveries may lead to costly follow-up experiments and as such it is important that the error rate on discoveries is not too h…
▽ More
Predictive modeling often uses black box machine learning methods, such as deep neural networks, to achieve state-of-the-art performance. In scientific domains, the scientist often wishes to discover which features are actually important for making the predictions. These discoveries may lead to costly follow-up experiments and as such it is important that the error rate on discoveries is not too high. Model-X knockoffs enable important features to be discovered with control of the FDR. However, knockoffs require rich generative models capable of accurately modeling the knockoff features while ensuring they obey the so-called "swap" property. We develop Deep Direct Likelihood Knockoffs (DDLK), which directly minimizes the KL divergence implied by the knockoff swap property. DDLK consists of two stages: it first maximizes the explicit likelihood of the features, then minimizes the KL divergence between the joint distribution of features and knockoffs and any swap between them. To ensure that the generated knockoffs are valid under any possible swap, DDLK uses the Gumbel-Softmax trick to optimize the knockoff generator under the worst-case swap. We find DDLK has higher power than baselines while controlling the false discovery rate on a variety of synthetic and real benchmarks including a task involving a large dataset from one of the epicenters of COVID-19.
△ Less
Submitted 31 July, 2020;
originally announced July 2020.
-
Learning Abstract Models for Strategic Exploration and Fast Reward Transfer
Authors:
Evan Zheran Liu,
Ramtin Keramati,
Sudarshan Seshadri,
Kelvin Guu,
Panupong Pasupat,
Emma Brunskill,
Percy Liang
Abstract:
Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions. However, learning an accurate Markov Decision Process (MDP) over high-dimensional states (e.g., raw pixels) is extremely challenging because it requires function approximation, which…
▽ More
Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions. However, learning an accurate Markov Decision Process (MDP) over high-dimensional states (e.g., raw pixels) is extremely challenging because it requires function approximation, which leads to compounding errors. Instead, to avoid compounding errors, we propose learning an abstract MDP over abstract states: low-dimensional coarse representations of the state (e.g., capturing agent position, ignoring other objects). We assume access to an abstraction function that maps the concrete states to abstract states. In our approach, we construct an abstract MDP, which grows through strategic exploration via planning. Similar to hierarchical RL approaches, the abstract actions of the abstract MDP are backed by learned subpolicies that navigate between abstract states. Our approach achieves strong results on three of the hardest Arcade Learning Environment games (Montezuma's Revenge, Pitfall!, and Private Eye), including superhuman performance on Pitfall! without demonstrations. After training on one task, we can reuse the learned abstract MDP for new reward functions, achieving higher reward in 1000x fewer samples than model-free methods trained from scratch.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.
-
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms
Authors:
Saeed Rashidi,
Matthew Denton,
Srinivas Sridharan,
Sudarshan Srinivasan,
Amoghavarsha Suresh,
Jade Ni,
Tushar Krishna
Abstract:
Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth. However, as we identify in this work, driving this bandwidth is quite challenging. This is because there is a pernicious balance between using the accelerator's compute and memory for both DL computations and commu…
▽ More
Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth. However, as we identify in this work, driving this bandwidth is quite challenging. This is because there is a pernicious balance between using the accelerator's compute and memory for both DL computations and communication. This work makes two key contributions. First, via real system measurements and detailed modeling, we provide an understanding of compute and memory bandwidth demands for DL compute and comms. Second, we propose a novel DL collective communication accelerator called Accelerator Collectives Engine (ACE) that sits alongside the compute and networking engines at the accelerator endpoint. ACE frees up the endpoint's compute and memory resources for DL compute, which in turn reduces the required memory BW by 3.5X on average to drive the same network BW compared to state-of-the-art baselines. For modern DL workloads and different network sizes, ACE, on average, increases the effective network bandwidth utilization by 1.44X (up to 2.67X), resulting in an average of 1.41X (up to 1.51X), 1.12X (up to 1.17X), and 1.13X (up to 1.19X) speedup in iteration time for ResNet-50, GNMT and DLRM when compared to the best baseline configuration, respectively.
△ Less
Submitted 4 May, 2022; v1 submitted 30 June, 2020;
originally announced July 2020.
-
LALR: Theoretical and Experimental validation of Lipschitz Adaptive Learning Rate in Regression and Neural Networks
Authors:
Snehanshu Saha,
Tejas Prashanth,
Suraj Aralihalli,
Sumedh Basarkod,
T. S. B Sudarshan,
Soma S Dhavala
Abstract:
We propose a theoretical framework for an adaptive learning rate policy for the Mean Absolute Error loss function and Quantile loss function and evaluate its effectiveness for regression tasks. The framework is based on the theory of Lipschitz continuity, specifically utilizing the relationship between learning rate and Lipschitz constant of the loss function. Based on experimentation, we have fou…
▽ More
We propose a theoretical framework for an adaptive learning rate policy for the Mean Absolute Error loss function and Quantile loss function and evaluate its effectiveness for regression tasks. The framework is based on the theory of Lipschitz continuity, specifically utilizing the relationship between learning rate and Lipschitz constant of the loss function. Based on experimentation, we have found that the adaptive learning rate policy enables up to 20x faster convergence compared to a constant learning rate policy.
△ Less
Submitted 19 May, 2020;
originally announced June 2020.
-
Detection and parameter estimation of binary neutron star merger remnants
Authors:
Paul J. Easter,
Sudarshan Ghonge,
Paul D. Lasky,
Andrew R. Casey,
James A. Clark,
Francisco Hernandez Vivanco,
Katerina Chatziioannou
Abstract:
Detection and parameter estimation of binary neutron star merger remnants can shed light on the physics of hot matter at supranuclear densities. Here we develop a fast, simple model that can generate gravitational waveforms, and show it can be used for both detection and parameter estimation of post-merger remnants. The model consists of three exponentially-damped sinusoids with a linear frequency…
▽ More
Detection and parameter estimation of binary neutron star merger remnants can shed light on the physics of hot matter at supranuclear densities. Here we develop a fast, simple model that can generate gravitational waveforms, and show it can be used for both detection and parameter estimation of post-merger remnants. The model consists of three exponentially-damped sinusoids with a linear frequency-drift term. The median fitting factors between the model waveforms and numerical-relativity simulations exceed 0.90. We detect remnants at a post-merger signal-to-noise ratio of $\ge 7$ using a Bayes-factor detection statistic with a threshold of 3000. We can constrain the primary post-merger frequency to $\pm_{1.2}^{1.4}\%$ at post-merger signal-to-noise ratios of 15 with an increase in precision to $\pm_{0.2}^{0.3}\%$ for post-merger signal-to-noise ratios of 50. The tidal coupling constant can be constrained to $\pm^{9}_{12}\%$ at post-merger signal-to-noise ratios of 15, and $\pm 5\%$ at post-merger signal-to-noise ratios of 50 using a hierarchical inference model.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
Supersymmetric Yang-Mills theory in D=6 without anti-commuting variables
Authors:
Sudarshan Ananth,
Hannes Malcha,
Chetan Pandey,
Saurabh Pant
Abstract:
Supersymmetric Yang-Mills theory is formulated in six dimensions, without the use of anti-commuting variables. This is achieved using a new Nicolai map, to third order in the coupling constant. This is the second such map in six dimensions and highlights a potential ambiguity in the formalism.
Supersymmetric Yang-Mills theory is formulated in six dimensions, without the use of anti-commuting variables. This is achieved using a new Nicolai map, to third order in the coupling constant. This is the second such map in six dimensions and highlights a potential ambiguity in the formalism.
△ Less
Submitted 8 July, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Perturbative linearization of supersymmetric Yang-Mills theory
Authors:
Sudarshan Ananth,
Olaf Lechtenfeld,
Hannes Malcha,
Hermann Nicolai,
Chetan Pandey,
Saurabh Pant
Abstract:
Supersymmetric gauge theories are characterized by the existence of a transformation of the bosonic fields (Nicolai map) such that the Jacobi determinant of the transformation equals the product of the Matthews-Salam-Seiler and Faddeev-Popov determinants. This transformation had been worked out to second order in the coupling constant. In this paper, we extend this result (and the framework itself…
▽ More
Supersymmetric gauge theories are characterized by the existence of a transformation of the bosonic fields (Nicolai map) such that the Jacobi determinant of the transformation equals the product of the Matthews-Salam-Seiler and Faddeev-Popov determinants. This transformation had been worked out to second order in the coupling constant. In this paper, we extend this result (and the framework itself) to third order in the coupling constant. A diagrammatic approach in terms of tree diagrams, aiming to extend this map to arbitrary orders, is outlined. This formalism bypasses entirely the use of anti-commuting variables, as well as issues concerning the (non-)existence of off-shell formulations for these theories. It thus offers a fresh perspective on supersymmetric gauge theories and, in particular, the ubiquitous $\mathcal N{=}\,4$ theory.
△ Less
Submitted 21 September, 2020; v1 submitted 25 May, 2020;
originally announced May 2020.
-
Higher spins, quadratic forms and amplitudes
Authors:
Sudarshan Ananth,
Chetan Pandey,
Saurabh Pant
Abstract:
The light-cone Hamiltonians for spin 1 and spin 2 fields, describing both the pure and the maximally supersymmetric theories, may be expressed as quadratic forms. In this paper, we show that this feature extends to light-cone higher spin theories. To first order in the coupling constant, we prove that the higher spin Hamiltonians, with and without supersymmetry, are quadratic forms. Scattering amp…
▽ More
The light-cone Hamiltonians for spin 1 and spin 2 fields, describing both the pure and the maximally supersymmetric theories, may be expressed as quadratic forms. In this paper, we show that this feature extends to light-cone higher spin theories. To first order in the coupling constant, we prove that the higher spin Hamiltonians, with and without supersymmetry, are quadratic forms. Scattering amplitude structures emerge naturally in this framework and we relate the momentum space vertex in a supersymmetric higher spin theory to the corresponding vertex in the N=4 Yang-Mills theory.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
GenNav: A Generic Indoor Navigation System for any Mobile Robot
Authors:
Sudarshan S Harithas,
Biswajit Pardia
Abstract:
The navigation system is at the heart of any mobile robot it comprises of SLAM and path planning units, which is utilized by the robot to generate a map of the environment, localize itself within it and determine an optimal a path to the destination. This paper describes the conceptualization, development, simulation and hardware implementation of GenNav a generic indoor navigation system for any…
▽ More
The navigation system is at the heart of any mobile robot it comprises of SLAM and path planning units, which is utilized by the robot to generate a map of the environment, localize itself within it and determine an optimal a path to the destination. This paper describes the conceptualization, development, simulation and hardware implementation of GenNav a generic indoor navigation system for any mobile aerial or ground robot. The generalization is brought about by modularizing and creating independence between the software computation and hardware actuation units by providing an alternate source of odometry from the LiDAR eliminating the requirement for dedicated odometry sensors. The odometry feedback from the LiDAR can be used by the navigation computation unit and the system can be generalized to a wide variety of robots, with different type and orientation of actuators
△ Less
Submitted 1 August, 2020; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Authors:
Dhiraj Kalamkar,
Evangelos Georganas,
Sudarshan Srinivasan,
Jian** Chen,
Mikhail Shiryaev,
Alexander Heinecke
Abstract:
During the last two years, the goal of many researchers has been to squeeze the last bit of performance out of HPC system for AI tasks. Often this discussion is held in the context of how fast ResNet50 can be trained. Unfortunately, ResNet50 is no longer a representative workload in 2020. Thus, we focus on Recommender Systems which account for most of the AI cycles in cloud computing centers. More…
▽ More
During the last two years, the goal of many researchers has been to squeeze the last bit of performance out of HPC system for AI tasks. Often this discussion is held in the context of how fast ResNet50 can be trained. Unfortunately, ResNet50 is no longer a representative workload in 2020. Thus, we focus on Recommender Systems which account for most of the AI cycles in cloud computing centers. More specifically, we focus on Facebook's DLRM benchmark. By enabling it to run on latest CPU hardware and software tailored for HPC, we are able to achieve more than two-orders of magnitude improvement in performance (110x) on a single socket compared to the reference CPU implementation, and high scaling efficiency up to 64 sockets, while fitting ultra-large datasets. This paper discusses the optimization techniques for the various operators in DLRM and which component of the systems are stressed by these different operators. The presented techniques are applicable to a broader set of DL workloads that pose the same scaling challenges/characteristics as DLRM.
△ Less
Submitted 10 May, 2020;
originally announced May 2020.
-
Characterization of systematic error in Advanced LIGO calibration
Authors:
Ling Sun,
Evan Goetz,
Jeffrey S. Kissel,
Joseph Betzwieser,
Sudarshan Karki,
Aaron Viets,
Madeline Wade,
Dripta Bhattacharjee,
Vladimir Bossilkov,
Pep B. Covas,
Laurence E. H. Datrier,
Rachel Gray,
Shivaraj Kandhasamy,
Yannick K. Lecoeuche,
Gregory Mendell,
Timesh Mistry,
Ethan Payne,
Richard L. Savage,
Alan J. Weinstein,
Stuart Aston,
Aaron Buikema,
Craig Cahillane,
Jenne C. Driggers,
Sheila E. Dwyer,
Rahul Kumar
, et al. (1 additional authors not shown)
Abstract:
The raw outputs of the detectors within the Advanced Laser Interferometer Gravitational-Wave Observatory need to be calibrated in order to produce the estimate of the dimensionless strain used for astrophysical analyses. The two detectors have been upgraded since the second observing run and finished the year-long third observing run. Understanding, accounting, and/or compensating for the complex-…
▽ More
The raw outputs of the detectors within the Advanced Laser Interferometer Gravitational-Wave Observatory need to be calibrated in order to produce the estimate of the dimensionless strain used for astrophysical analyses. The two detectors have been upgraded since the second observing run and finished the year-long third observing run. Understanding, accounting, and/or compensating for the complex-valued response of each part of the upgraded detectors improves the overall accuracy of the estimated detector response to gravitational waves. We describe improved understanding and methods used to quantify the response of each detector, with a dedicated effort to define all places where systematic error plays a role. We use the detectors as they stand in the first half (six months) of the third observing run to demonstrate how each identified systematic error impacts the estimated strain and constrain the statistical uncertainty therein. For this time period, we estimate the upper limit on systematic error and associated uncertainty to be $< 7\%$ in magnitude and $< 4$ deg in phase ($68\%$ confidence interval) in the most sensitive frequency band 20-2000 Hz. The systematic error alone is estimated at levels of $< 2\%$ in magnitude and $< 2$ deg in phase.
△ Less
Submitted 1 September, 2020; v1 submitted 5 May, 2020;
originally announced May 2020.
-
Reconstructing gravitational wave signals from binary black hole mergers with minimal assumptions
Authors:
Sudarshan Ghonge,
Katerina Chatziioannou,
James A. Clark,
Tyson Littenberg,
Margaret Millhouse,
Laura Cadonati,
Neil Cornish
Abstract:
We present a systematic comparison of the binary black hole (BBH) signal waveform reconstructed by two independent and complementary approaches used in LIGO and Virgo source inference: a template-based analysis, and a morphology-independent analysis. We apply the two approaches to real events and to two sets of simulated observations made by adding simulated BBH signals to LIGO and Virgo detector…
▽ More
We present a systematic comparison of the binary black hole (BBH) signal waveform reconstructed by two independent and complementary approaches used in LIGO and Virgo source inference: a template-based analysis, and a morphology-independent analysis. We apply the two approaches to real events and to two sets of simulated observations made by adding simulated BBH signals to LIGO and Virgo detector noise. The first set is representative of the 10 BBH events in the first Gravitational Wave Transient Catalog (GWTC-1). The second set is constructed from a population of BBH systems with total mass and signal strength in the ranges that ground based detectors are typically sensitive. We find that the reconstruction quality of the GWTC-1 events is consistent with the results of both sets of simulated signals. We also demonstrate a simulated case where the presence of a mismodelled effect in the observed signal, namely higher order modes, can be identified through the morphology-independent analysis. This study is relevant for currently progressing and future observational runs by LIGO and Virgo.
△ Less
Submitted 22 September, 2020; v1 submitted 20 March, 2020;
originally announced March 2020.
-
A proto-object based audiovisual saliency map
Authors:
Sudarshan Ramenahalli
Abstract:
Natural environment and our interaction with it is essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, specifically vision and audio. We develop a proto-object based audiovisual saliency map (AVSM) for the analys…
▽ More
Natural environment and our interaction with it is essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, specifically vision and audio. We develop a proto-object based audiovisual saliency map (AVSM) for the analysis of dynamic natural scenes. A specialized audiovisual camera with $360 \degree$ Field of View, capable of locating sound direction, is used to collect spatiotemporally aligned audiovisual data. We demonstrate that the performance of proto-object based audiovisual saliency map in detecting and localizing salient objects/events is in agreement with human judgment. In addition, the proto-object based AVSM that we compute as a linear combination of visual and auditory feature conspicuity maps captures a higher number of valid salient events compared to unisensory saliency maps. Such an algorithm can be useful in surveillance, robotic navigation, video compression and related applications.
△ Less
Submitted 15 March, 2020;
originally announced March 2020.
-
A model of figure ground organization incorporating local and global cues
Authors:
Sudarshan Ramenahalli
Abstract:
Figure Ground Organization (FGO) -- inferring spatial depth ordering of objects in a visual scene -- involves determining which side of an occlusion boundary is figure (closer to the observer) and which is ground (further away from the observer). A combination of global cues, like convexity, and local cues, like T-junctions are involved in this process. We present a biologically motivated, feed fo…
▽ More
Figure Ground Organization (FGO) -- inferring spatial depth ordering of objects in a visual scene -- involves determining which side of an occlusion boundary is figure (closer to the observer) and which is ground (further away from the observer). A combination of global cues, like convexity, and local cues, like T-junctions are involved in this process. We present a biologically motivated, feed forward computational model of FGO incorporating convexity, surroundedness, parallelism as global cues and Spectral Anisotropy (SA), T-junctions as local cues. While SA is computed in a biologically plausible manner, the inclusion of T-Junctions is biologically motivated. The model consists of three independent feature channels, Color, Intensity and Orientation, but SA and T-Junctions are introduced only in the Orientation channel as these properties are specific to that feature of objects. We study the effect of adding each local cue independently and both of them simultaneously to the model with no local cues. We evaluate model performance based on figure-ground classification accuracy (FGCA) at every border location using the BSDS 300 figure-ground dataset. Each local cue, when added alone, gives statistically significant improvement in the FGCA of the model suggesting its usefulness as an independent FGO cue. The model with both local cues achieves higher FGCA than the models with individual cues, indicating SA and T-Junctions are not mutually contradictory. Compared to the model with no local cues, the feed-forward model with both local cues achieves $\geq 8.78$% improvement in terms of FGCA.
△ Less
Submitted 14 March, 2020;
originally announced March 2020.
-
Laser stimulated second and third harmonic optical effects in F: SnO2 nanostructures grown via chemical synthetic route
Authors:
Anusha,
B. Sudarshan Acharya,
Albin Antony,
Aninamol Ani,
I. V. Kityk,
J. Jedryka,
P. Rakus,
A. Wojciechowski,
P. Poornesh,
Suresh D. Kulkarni
Abstract:
Laser stimulated second and third harmonic generation effects in Fluorine doped tin oxide (F:SnO2) nanostructures versus the fluorine content is presented. The F:SnO2 nanostructures have been fabricated at various fluorine do** concentrations by spray pyrolysis technique. The films exhibit polycrystalline nature with a preferential growth orientation along (1 1 0) diffraction plane as evident fr…
▽ More
Laser stimulated second and third harmonic generation effects in Fluorine doped tin oxide (F:SnO2) nanostructures versus the fluorine content is presented. The F:SnO2 nanostructures have been fabricated at various fluorine do** concentrations by spray pyrolysis technique. The films exhibit polycrystalline nature with a preferential growth orientation along (1 1 0) diffraction plane as evident from x-ray diffraction studies. The optical transmittance of the F:SnO2 films has increased from 68 percent to 80 percent. Photoluminescence studies revealed that strong violet emission peak corresponds to 400 nm and relatively weak red emission peak at about 675 nm was observed for all the F:SnO2 films. Increase in the\b{eta}eff value upon fluorine incorporation supports the applicability of the deposited films in passive optical limiting applications. The principal origin of second harmonic generation signals (SHG) for this type of nanostructures is played by the space charge density acentricity due to the F do**. The enhanced second and third harmonic generation signals observed on F:SnO2 nanostructures endorses the credibility of these materials in various nonlinear optical trigger device applications.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
Supersymmetric Yang-Mills Theories: not quite the usual perspective
Authors:
Sudarshan Ananth,
Hermann Nicolai,
Chetan Pandey,
Saurabh Pant
Abstract:
In this paper, we take up an old thread of development concerning the characterization of supersymmetric theories without any use of anticommuting variables that goes back to one of the authors' very early work [1]. Our special focus here will be on the formulation of supersymmetric Yang-Mills theories, extending previous results beyond $D=4$ dimensions. This perspective is likely to provide new i…
▽ More
In this paper, we take up an old thread of development concerning the characterization of supersymmetric theories without any use of anticommuting variables that goes back to one of the authors' very early work [1]. Our special focus here will be on the formulation of supersymmetric Yang-Mills theories, extending previous results beyond $D=4$ dimensions. This perspective is likely to provide new insights into these theories, and in particular the maximally extended $N=4$ theory. As a new result we present a novel derivation of the admissible dimensions for interacting (pure) super-Yang-Mills theories to exist. This article is dedicated to the memory of Peter Freund, amongst many other things an early contributor to supersymmetry, and an author of one of the very first papers on superconformal gauge theories [2]. The final section contains some personal reminiscences of H.N.'s encounters with Peter Freund.
△ Less
Submitted 12 March, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.
-
Strategic improvement of second and third harmonic generation in multifunctional Cu-Sn-S3 ternary semiconducting thin films
Authors:
B. Sudarshan Acharya,
Anushaa,
Albin Antony,
Aninamol Ani,
I. V. Kityk,
K. Ozga,
A. Slezak,
J. Jedryka,
P. Poornesh,
K. B. Manjunatha,
Shashidhara Acharya
Abstract:
We propose a low-cost approach for the synthesis of multifunctional CuSnS3 (CTS) ternary compound thin film via spray pyrolysis technique. By varying Sn and S do** concentrations, a high energy absorber layers of CuSnS3 thin films were deposited on a glass substrate at a substrate temperature of 405 C. The prepared samples were analysed with respect to their optical, structural and electrical an…
▽ More
We propose a low-cost approach for the synthesis of multifunctional CuSnS3 (CTS) ternary compound thin film via spray pyrolysis technique. By varying Sn and S do** concentrations, a high energy absorber layers of CuSnS3 thin films were deposited on a glass substrate at a substrate temperature of 405 C. The prepared samples were analysed with respect to their optical, structural and electrical and nonlinear optical properties. X-Ray diffraction (XRD) analysis reveals that the films exhibit a tetragonal crystal structure with a preferential growth orientation along (1 1 2). The surface morphology of the films was explored by atomic force microscopy (AFM) in tap** mode configuration. Variation in the carrier charge density and electrical properties were observed for different Sn and S combination. The analysis of the Raman spectra indicates the presence of multiple phases apart from CuSnS3. The obtained Raman spectra were assigned to phonon mode as per zone centre phonon representation of optical and acoustic modes and identified to dominant A symmetry modes. The maximal laser stimulated induced second harmonic generation (SHG) signal was observed for the CTS 3 film and the corresponding second order nonlinear optical susceptibility which was equal to about 0.89 pm/V at 1064 nm and the minimal SHG signal was found for the CTS 2 film (about 0.22 pm/V). This strategic improvement in SHG and third harmonic generation (THG) signal efficiency endorses the role of Sn and S in modulating second and third harmonic generations in CuSnS3 compound.
△ Less
Submitted 13 January, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.
-
Disentangling shock diffusion on complex networks: Identification through graph planarity
Authors:
Sudarshan Kumar,
Tiziana Di Matteo,
Anindya S. Chakrabarti
Abstract:
Large scale networks delineating collective dynamics often exhibit cascading failures across nodes leading to a system-wide collapse. Prominent examples of such phenomena would include collapse on financial and economic networks. Intertwined nature of the dynamics of nodes in such network makes it difficult to disentangle the source and destination of a shock that percolates through the network, a…
▽ More
Large scale networks delineating collective dynamics often exhibit cascading failures across nodes leading to a system-wide collapse. Prominent examples of such phenomena would include collapse on financial and economic networks. Intertwined nature of the dynamics of nodes in such network makes it difficult to disentangle the source and destination of a shock that percolates through the network, a property known as reflexivity. In this article, a novel methodology is proposed which combines vector autoregression model with an unique identification restrictions obtained from the topological structure of the network to uniquely characterize cascades. In particular, we show that planarity of the network allows us to statistically estimate a dynamical process consistent with the observed network and thereby uniquely identify a path for shock propagation from any chosen epicenter to all other nodes in the network. We analyze the distress propagation mechanism in closed loops giving rise to a detailed picture of the effect of feedback loops in transmitting shocks. We show usefulness and applications of the algorithm in two networks with dynamics at different time-scales: worldwide GDP growth network and stock network. In both cases, we observe that the model predicts the impact of the shocks emanating from the US would be concentrated within the cluster of developed countries and the develo** countries show very muted response, which is consistent with empirical observations over the past decade.
△ Less
Submitted 6 January, 2020;
originally announced January 2020.
-
Edit Based Grading of SQL Queries
Authors:
Bikash Chandra,
Ananyo Banerjee,
Udbhas Hazra,
Mathew Joseph,
S. Sudarshan
Abstract:
Grading student SQL queries manually is a tedious and error-prone process. Earlier work on testing correctness of student SQL queries, such as the XData system, can be used to test correctness of a student query. However, in case a student query is found to be incorrect there is currently no way to automatically assign partial marks. Partial marking is important so that small errors are penalized…
▽ More
Grading student SQL queries manually is a tedious and error-prone process. Earlier work on testing correctness of student SQL queries, such as the XData system, can be used to test correctness of a student query. However, in case a student query is found to be incorrect there is currently no way to automatically assign partial marks. Partial marking is important so that small errors are penalized less than large errors. Manually awarding partial marks is not scalable for classes with large number of students, especially MOOCs, and is also prone to human errors.
In this paper, we discuss techniques to find a minimum cost set of edits to a student query that would make it correct, which can help assign partial marks, and to help students understand exactly where they went wrong. Given the limitations of current formal methods for checking equivalence, our approach is based on finding nearest query, from a set of instructor provided correct queries, that is found to be equivalent based on query canonicalization. We show that exhaustive techniques are expensive, and propose a greedy heuristic approach that works well both in terms of runtime and accuracy on queries in real-world datasets. Our system can also be used in a learning mode where query edits can be suggested as feedback to students to guide them towards a correct query. Our partial marking system has been successfully used in courses at IIT Bombay and IIT Dharwad.
△ Less
Submitted 19 December, 2019;
originally announced December 2019.
-
Adaptive Multi-bit SRAM Topology Based Analog PUF
Authors:
Sudarshan Sharma,
Dhruv Thapar,
Nikhil Bhelave,
Mrigank Sharad
Abstract:
Physically Unclonable Functions (PUFs) are lightweight cryptographic primitives for generating unique signatures from minuscule manufacturing variations. In this work, we present lightweight, area efficient and low power adaptive multi-bit SRAM topology based Current Mirror Array (CMA) analog PUF design for securing the sensor nodes, authentication and key generation. The proposed Strong PUF incre…
▽ More
Physically Unclonable Functions (PUFs) are lightweight cryptographic primitives for generating unique signatures from minuscule manufacturing variations. In this work, we present lightweight, area efficient and low power adaptive multi-bit SRAM topology based Current Mirror Array (CMA) analog PUF design for securing the sensor nodes, authentication and key generation. The proposed Strong PUF increases the complexity of the machine learning attacks thus making it difficult for the adversary. The design is based on scl180 library.
△ Less
Submitted 14 December, 2019;
originally announced December 2019.
-
Domain-independent Dominance of Adaptive Methods
Authors:
Pedro Savarese,
David McAllester,
Sudarshan Babu,
Michael Maire
Abstract:
From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned. We observe that the power of our method is partially explained by a decoupling of learning rate and adaptability, greatly simplifying hyperparameter search. In light of this observation, we demonstrate that, against conventional wisdom, A…
▽ More
From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned. We observe that the power of our method is partially explained by a decoupling of learning rate and adaptability, greatly simplifying hyperparameter search. In light of this observation, we demonstrate that, against conventional wisdom, Adam can also outperform SGD on vision tasks, as long as the coupling between its learning rate and adaptability is taken into account. In practice, AvaGrad matches the best results, as measured by generalization accuracy, delivered by any existing optimizer (SGD or adaptive) across image classification (CIFAR, ImageNet) and character-level language modelling (Penn Treebank) tasks.
△ Less
Submitted 16 March, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
Criterion for existence of a logarithmic connection on a principal bundle over a smooth complex projective variety
Authors:
Sudarshan Gurjar,
Arjun Paul
Abstract:
Let $X$ be a connected smooth complex projective variety of dimension $n \geq 1$. Let $D$ be a simple normal crossing divisor on $X$. Let $G$ be a connected complex Lie group, and $E_G$ a holomorphic principal $G$-bundle on $X$. In this article, we give criterion for existence of a logarithmic connection on $E_G$ singular along $D$.
Let $X$ be a connected smooth complex projective variety of dimension $n \geq 1$. Let $D$ be a simple normal crossing divisor on $X$. Let $G$ be a connected complex Lie group, and $E_G$ a holomorphic principal $G$-bundle on $X$. In this article, we give criterion for existence of a logarithmic connection on $E_G$ singular along $D$.
△ Less
Submitted 30 June, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
Optimal Non-Coherent Detector for Ambient Backscatter Communication System
Authors:
Sudarshan Guruacharya,
Xiao Lu,
Ekram Hossain
Abstract:
The probability density function (pdf) of the received signal of an ambient backscatter communication system is derived, assuming that on-off keying (OOK) is performed at the tag, and that the ambient radio frequency (RF) signal is white Gaussian. The pdf of the received signal is then utilized to design two different types of non-coherent detectors. The first detector directly uses the received s…
▽ More
The probability density function (pdf) of the received signal of an ambient backscatter communication system is derived, assuming that on-off keying (OOK) is performed at the tag, and that the ambient radio frequency (RF) signal is white Gaussian. The pdf of the received signal is then utilized to design two different types of non-coherent detectors. The first detector directly uses the received signal to perform a hypothesis test. The second detector first estimates the channel based on the observed signal and then performs the hypothesis test. Test statistics and optimal decision threshold of the detectors are derived. The energy detector is shown to be an approximation of the second detector. For cases where the reader is able to avoid or cancel the direct interference from the RF source (e.g., through successive interference cancellation), a third detector is given as a special case of the first detector. Numerical results show that both the first and the second detectors have the same bit error rate (BER) performance, making the second detector preferable over the first detector due to its computational simplicity.
△ Less
Submitted 26 October, 2020; v1 submitted 22 November, 2019;
originally announced November 2019.
-
Existence of nodal line semi-metal in a generalized three dimensional Haldane model
Authors:
Sudarshan Saha,
Saptarshi Mandal
Abstract:
We construct and study a time reversal broken tight binding model on diamond lattice with complex next-nearest-neighbour hop** which can be thought of as a generalisation of two dimensional Haldane model in three dimension. The model also breaks inversion symmetry owing to sub-lattice dependent chemical potential. We calculate the spectrum of the model and find the existence of six pairs of anis…
▽ More
We construct and study a time reversal broken tight binding model on diamond lattice with complex next-nearest-neighbour hop** which can be thought of as a generalisation of two dimensional Haldane model in three dimension. The model also breaks inversion symmetry owing to sub-lattice dependent chemical potential. We calculate the spectrum of the model and find the existence of six pairs of anisotropic gapless points with linear dependence on momentum. The coordinates of the gapless points are ($2 π, π\pm k_0,0),~ (2 π, π\pm k_0,2 π)$ and their possible permutations . The condition for gapless spectrum is very similar to the two dimensional case. Each gapless points are having well defined chirality and in the gapless phase specific set of planes have non-zero Chern number. The gapped phase is a trivial bulk insulator which has vanishing Chern number as well as Hopf index. The model belongs to the symmetry class AIII according to the ten-fold way of classification. Surprisingly the gapless phase does contain a gapped surface state where as the gapped state has a gapless surface states as found in (1,1,1) direction.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
eBrainII: A 3 kW Realtime Custom 3D DRAM integrated ASIC implementation of a Biologically Plausible Model of a Human Scale Cortex
Authors:
Dimitrios Stathis,
Chirag Sudarshan,
Yu Yang,
Matthias Jung,
Syed Asad Mohamad Hasan Jafri,
Christian Weis,
Ahmed Hemani,
Anders Lansner,
Norbert Wehn
Abstract:
The Artificial Neural Networks (ANNs) like CNN/DNN and LSTM are not biologically plausible and in spite of their initial success, they cannot attain the cognitive capabilities enabled by the dynamic hierarchical associative memory systems of biological brains. The biologically plausible spiking brain models, for e.g. cortex, basal ganglia and amygdala have a greater potential to achieve biological…
▽ More
The Artificial Neural Networks (ANNs) like CNN/DNN and LSTM are not biologically plausible and in spite of their initial success, they cannot attain the cognitive capabilities enabled by the dynamic hierarchical associative memory systems of biological brains. The biologically plausible spiking brain models, for e.g. cortex, basal ganglia and amygdala have a greater potential to achieve biological brain like cognitive capabilities. Bayesian Confidence Propagation Neural Network (BCPNN) is a biologically plausible spiking model of cortex. A human scale model of BCPNN in real time requires 162 TFlops/s, 50 TBs of synaptic weight storage to be accessed with a bandwidth of 200 TBs. The spiking bandwidth is relatively modest at 250 GBs/s. A hand optimized implementation of rodent scale BCPNN has been implemented on Tesla K80 GPUs require 3 kW, we extrapolate from that a human scale network will require 3 MW. These power numbers rule out such implementations for field deployment as advanced cognition engines in embedded systems. The key innovation that this paper reports is that it is feasible and affordable to implement real time BCPNN as a custom tiled ASIC in 28 nm technology with custom 3D DRAM - eBrain II - that consumes 3 kWs for human scale and 12 W for rodent scale cortex model. Such implementations eminently fulfill the demands for field deployment.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.
-
K-TanH: Efficient TanH For Deep Learning
Authors:
Abhisek Kundu,
Alex Heinecke,
Dhiraj Kalamkar,
Sudarshan Srinivasan,
Eric C. Qin,
Naveen K. Mellempudi,
Dipankar Das,
Kunal Banerjee,
Bharat Kaul,
Pradeep Dubey
Abstract:
We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning. K-TanH consists of parameterized low-precision integer operations, such as, shift and add/subtract (no floating point operation needed) where parameters are stored in very small look-up tables that can fit in CPU registers. K-TanH can work on various numerical format…
▽ More
We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning. K-TanH consists of parameterized low-precision integer operations, such as, shift and add/subtract (no floating point operation needed) where parameters are stored in very small look-up tables that can fit in CPU registers. K-TanH can work on various numerical formats, such as, Float32 and BFloat16. High quality approximations to other activation functions, e.g., Sigmoid, Swish and GELU, can be derived from K-TanH. Our AVX512 implementation of K-TanH demonstrates $>5\times$ speed up over Intel SVML, and it is consistently superior in efficiency over other approximations that use floating point arithmetic. Finally, we achieve state-of-the-art Bleu score and convergence results for training language translation model GNMT on WMT16 data sets with approximate TanH obtained via K-TanH on BFloat16 inputs.
△ Less
Submitted 7 June, 2020; v1 submitted 17 September, 2019;
originally announced September 2019.
-
High Performance Scalable FPGA Accelerator for Deep Neural Networks
Authors:
Sudarshan Srinivasan,
Pradeep Janedula,
Saurabh Dhoble,
Sasikanth Avancha,
Dipankar Das,
Naveen Mellempudi,
Bharat Daga,
Martin Langhammer,
Gregg Baeckler,
Bharat Kaul
Abstract:
Low-precision is the first order knob for achieving higher Artificial Intelligence Operations (AI-TOPS). However the algorithmic space for sub-8-bit precision compute is diverse, with disruptive changes happening frequently, making FPGAs a natural choice for Deep Neural Network inference, In this work we present an FPGA-based accelerator for CNN inference acceleration. We use {\it INT-8-2} compute…
▽ More
Low-precision is the first order knob for achieving higher Artificial Intelligence Operations (AI-TOPS). However the algorithmic space for sub-8-bit precision compute is diverse, with disruptive changes happening frequently, making FPGAs a natural choice for Deep Neural Network inference, In this work we present an FPGA-based accelerator for CNN inference acceleration. We use {\it INT-8-2} compute (with {\it 8 bit} activation and {2 bit} weights) which is recently showing promise in the literature, and which no known ASIC, CPU or GPU natively supports today. Using a novel Adaptive Logic Module (ALM) based design, as a departure from traditional DSP based designs, we are able to achieve high performance measurement of 5 AI-TOPS for {\it Arria10} and project a performance of 76 AI-TOPS at 0.7 TOPS/W for {\it Stratix10}. This exceeds known CPU, GPU performance and comes close to best known ASIC (TPU) numbers, while retaining the versatility of the FPGA platform for other applications.
△ Less
Submitted 29 August, 2019;
originally announced August 2019.
-
A Reliable IoT-Based Embedded Health Care System for Diabetic Patients
Authors:
Zeyad A. Al-Odat,
Sudarshan K. Srinivasan,
Eman M. Al-Qtiemat,
Sana Shuja
Abstract:
This paper introduces a reliable health care system for diabetic patients based on the Internet of Things technology. A diabetic health care system with a hardware implementation is presented. The proposed work employs Alaris 8100 infusion pump, Keil LPC-1768 board, and IoT-cloud to monitor the diabetic patients. The security of diabetic data over the cloud and the communication channel between he…
▽ More
This paper introduces a reliable health care system for diabetic patients based on the Internet of Things technology. A diabetic health care system with a hardware implementation is presented. The proposed work employs Alaris 8100 infusion pump, Keil LPC-1768 board, and IoT-cloud to monitor the diabetic patients. The security of diabetic data over the cloud and the communication channel between health care system components are considered as part of the main contributions of this work. Moreover, an easy way to control and monitor the diabetic insulin pump is implemented. The \mbox{patient\textquotesingle s} records are stored in the cloud using the Keil board that is connected to the infusion pump. The reliability of the proposed scheme is accomplished by testing the system for five performance characteristics (availability, confidentiality, integrity, authentication, and authorization). The Kiel board is embedded with Ethernet port and Cortex-M3 micro-controller that controls the insulin infusion pump. The secure hash algorithm and secure socket shell are employed to achieve the reliability components of the proposed scheme. The results show that the proposed design is reliable, secure and authentic according to different test experiments and a case study of the Markov model. Moreover, a 99.3\% availability probability has been achieved after analyzing the case study.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
Edge Computing-Enabled Cell-Free Massive MIMO Systems
Authors:
Sudarshan Mukherjee,
Jemin Lee
Abstract:
Mobile edge computing (MEC) has been introduced to provide additional computing capabilities at network edges in order to improve performance of latency critical applications. In this paper, we consider the cell-free (CF) massive MIMO framework with implementing MEC functionalities. We consider multiple types of users with different average time requirements for computing/processing the tasks, and…
▽ More
Mobile edge computing (MEC) has been introduced to provide additional computing capabilities at network edges in order to improve performance of latency critical applications. In this paper, we consider the cell-free (CF) massive MIMO framework with implementing MEC functionalities. We consider multiple types of users with different average time requirements for computing/processing the tasks, and consider access points (APs) with MEC servers and a central server (CS) with the cloud computing capability. After deriving successful communication and computing probabilities using stochastic geometry and queueing theory, we present the successful edge computing probability (SECP) for a target computation latency. Through numerical results, we also analyze the impact of the AP coverage and the offloading probability to the CS on the SECP. It is observed that the optimal probability of offloading to the CS in terms of the SECP decreases with the AP coverage. Finally, we numerically characterize the minimum required energy consumption for guaranteeing a desired level of SECP. It is observed that for any desired level of SECP, it is more energy efficient to have larger number of APs as compared to having more number of antennas at each AP with smaller AP density.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
Noise spectral estimation methods and their impact on gravitational wave measurement of compact binary mergers
Authors:
Katerina Chatziioannou,
Carl-Johan Haster,
Tyson B. Littenberg,
Will M. Farr,
Sudarshan Ghonge,
Margaret Millhouse,
James A. Clark,
Neil Cornish
Abstract:
Estimating the parameters of gravitational wave signals detected by ground-based detectors requires an understanding of the properties of the detectors' noise. In particular, the most commonly used likelihood function for gravitational wave data analysis assumes that the noise is Gaussian, stationary, and of known frequency-dependent variance. The variance of the colored Gaussian noise is used as…
▽ More
Estimating the parameters of gravitational wave signals detected by ground-based detectors requires an understanding of the properties of the detectors' noise. In particular, the most commonly used likelihood function for gravitational wave data analysis assumes that the noise is Gaussian, stationary, and of known frequency-dependent variance. The variance of the colored Gaussian noise is used as a whitening filter on the data before computation of the likelihood function. In practice the noise variance is not known and it evolves over timescales of dozens of seconds to minutes. We study two methods for estimating this whitening filter for ground-based gravitational wave detectors with the goal of performing parameter estimation studies. The first method uses large amounts of data separated from the specific segment we wish to analyze and computes the power spectral density of the noise through the mean-median Welch method. The second method uses the same data segment as the parameter estimation analysis, which potentially includes a gravitational wave signal, and obtains the whitening filter through a fit of the power spectrum of the data in terms of a sum of splines and Lorentzians. We compare these two methods and argue that the latter is more reliable for gravitational wave parameter estimation.
△ Less
Submitted 5 November, 2019; v1 submitted 15 July, 2019;
originally announced July 2019.
-
Mixed Precision Training With 8-bit Floating Point
Authors:
Naveen Mellempudi,
Sudarshan Srinivasan,
Dipankar Das,
Bharat Kaul
Abstract:
Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size. In recent years, deep learning training has largely migrated to 16-bit precision, with significant gains in performance and energy efficiency. However, attempts to train DNNs at 8-bit precision have met with significant challenges because…
▽ More
Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size. In recent years, deep learning training has largely migrated to 16-bit precision, with significant gains in performance and energy efficiency. However, attempts to train DNNs at 8-bit precision have met with significant challenges because of the higher precision and dynamic range requirements of back-propagation. In this paper, we propose a method to train deep neural networks using 8-bit floating point representation for weights, activations, errors, and gradients. In addition to reducing compute precision, we also reduced the precision requirements for the master copy of weights from 32-bit to 16-bit. We demonstrate state-of-the-art accuracy across multiple data sets (imagenet-1K, WMT16) and a broader set of workloads (Resnet-18/34/50, GNMT, Transformer) than previously reported. We propose an enhanced loss scaling method to augment the reduced subnormal range of 8-bit floating point for improved error propagation. We also examine the impact of quantization noise on generalization and propose a stochastic rounding technique to address gradient noise. As a result of applying all these techniques, we report slightly higher validation accuracy compared to full precision baseline.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.