-
Physics-based material parameters extraction from perovskite experiments via Bayesian optimization
Authors:
Hualin Zhan,
Viqar Ahmad,
Azul Mayon,
Grace Tabi,
Anh Dinh Bui,
Zhuofeng Li,
Daniel Walter,
Hieu Nguyen,
Klaus Weber,
Thomas White,
Kylie Catchpole
Abstract:
The ability to extract material parameters of perovskite from quantitative experimental analysis is essential for rational design of photovoltaic and optoelectronic applications. However, the difficulty of this analysis increases significantly with the complexity of the theoretical model and the number of material parameters for perovskite. Here we use Bayesian optimization to develop an analysis…
▽ More
The ability to extract material parameters of perovskite from quantitative experimental analysis is essential for rational design of photovoltaic and optoelectronic applications. However, the difficulty of this analysis increases significantly with the complexity of the theoretical model and the number of material parameters for perovskite. Here we use Bayesian optimization to develop an analysis platform that can extract up to 8 fundamental material parameters of an organometallic perovskite semiconductor from a transient photoluminescence experiment, based on a complex full physics model that includes drift-diffusion of carriers and dynamic defect occupation. An example study of thermal degradation reveals that the carrier mobility and trap-assisted recombination coefficient are reduced noticeably, while the defect energy level remains nearly unchanged. The reduced carrier mobility can dominate the overall effect on thermal degradation of perovskite solar cells by reducing the fill factor, despite the opposite effect of the reduced trap-assisted recombination coefficient on increasing the fill factor. In future, this platform can be conveniently applied to other experiments or to combinations of experiments, accelerating materials discovery and optimization of semiconductor materials for photovoltaics and other applications.
△ Less
Submitted 29 May, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
A RISC-V MCU with adaptive reverse body bias and ultra-low-power retention mode in 22 nm FD-SOI
Authors:
Heiner Bauer,
Marco Stolba,
Stefan Scholze,
Dennis Walter,
Christian Mayr,
Alexander Oefelein,
Sebastian Höppner,
André Scharfe,
Flo Schraut,
Holger Eisenreich
Abstract:
We present a low-power, energy efficient 32-bit RISC-V microprocessor unit (MCU) in 22 nm FD-SOI. It achieves ultra-low leakage,even at high temperatures, by using an adaptive reverse body biasing aware sign-off approach, a low-power optimized physical implementation, and custom SRAM macros with retention mode. We demonstrate the robustness of the chip with measurements over the full industrial te…
▽ More
We present a low-power, energy efficient 32-bit RISC-V microprocessor unit (MCU) in 22 nm FD-SOI. It achieves ultra-low leakage,even at high temperatures, by using an adaptive reverse body biasing aware sign-off approach, a low-power optimized physical implementation, and custom SRAM macros with retention mode. We demonstrate the robustness of the chip with measurements over the full industrial temperature range, from -40 °C to 125 °C. Our results match the state of the art (SOTA) with 4.8 uW / MHz at 50 MHz in active mode and surpass the SOTA in ultra-low-power retention mode.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Authors:
Ayon Basumallik,
Darius Bunandar,
Nicholas Dronen,
Nicholas Harris,
Ludmila Levkova,
Calvin McCarter,
Lakshmi Nair,
David Walter,
David Widemann
Abstract:
Analog mixed-signal (AMS) devices promise faster, more energy-efficient deep neural network (DNN) inference than their digital counterparts. However, recent studies show that DNNs on AMS devices with fixed-point numbers can incur an accuracy penalty because of precision loss. To mitigate this penalty, we present a novel AMS-compatible adaptive block floating-point (ABFP) number representation. We…
▽ More
Analog mixed-signal (AMS) devices promise faster, more energy-efficient deep neural network (DNN) inference than their digital counterparts. However, recent studies show that DNNs on AMS devices with fixed-point numbers can incur an accuracy penalty because of precision loss. To mitigate this penalty, we present a novel AMS-compatible adaptive block floating-point (ABFP) number representation. We also introduce amplification (or gain) as a method for increasing the accuracy of the number representation without increasing the bit precision of the output. We evaluate the effectiveness of ABFP on the DNNs in the MLPerf datacenter inference benchmark -- realizing less than $1\%$ loss in accuracy compared to FLOAT32. We also propose a novel method of finetuning for AMS devices, Differential Noise Finetuning (DNF), which samples device noise to speed up finetuning compared to conventional Quantization-Aware Training.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Exoplanet Characterization using Conditional Invertible Neural Networks
Authors:
Jonas Haldemann,
Victor Ksoll,
Daniel Walter,
Yann Alibert,
Ralf S. Klessen,
Willy Benz,
Ullrich Koethe,
Lynton Ardizzone,
Carsten Rother
Abstract:
The characterization of an exoplanet's interior is an inverse problem, which requires statistical methods such as Bayesian inference in order to be solved. Current methods employ Markov Chain Monte Carlo (MCMC) sampling to infer the posterior probability of planetary structure parameters for a given exoplanet. These methods are time consuming since they require the calculation of a large number of…
▽ More
The characterization of an exoplanet's interior is an inverse problem, which requires statistical methods such as Bayesian inference in order to be solved. Current methods employ Markov Chain Monte Carlo (MCMC) sampling to infer the posterior probability of planetary structure parameters for a given exoplanet. These methods are time consuming since they require the calculation of a large number of planetary structure models. To speed up the inference process when characterizing an exoplanet, we propose to use conditional invertible neural networks (cINNs) to calculate the posterior probability of the internal structure parameters. cINNs are a special type of neural network which excel in solving inverse problems. We constructed a cINN using FrEIA, which was then trained on a database of $5.6\cdot 10^6$ internal structure models to recover the inverse map** between internal structure parameters and observable features (i.e., planetary mass, planetary radius and composition of the host star). The cINN method was compared to a Metropolis-Hastings MCMC. For that we repeated the characterization of the exoplanet K2-111 b, using both the MCMC method and the trained cINN. We show that the inferred posterior probability of the internal structure parameters from both methods are very similar, with the biggest differences seen in the exoplanet's water content. Thus cINNs are a possible alternative to the standard time-consuming sampling methods. Indeed, using cINNs allows for orders of magnitude faster inference of an exoplanet's composition than what is possible using an MCMC method, however, it still requires the computation of a large database of internal structures to train the cINN. Since this database is only computed once, we found that using a cINN is more efficient than an MCMC, when more than 10 exoplanets are characterized using the same cINN.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
Hardware Implementation of an OPC UA Server for Industrial Field Devices
Authors:
Heiner Bauer,
Sebastian Höppner,
Chris Iatrou,
Zohra Charania,
Stephan Hartmann,
Saif-Ur Rehman,
Andreas Dixius,
Georg Ellguth,
Dennis Walter,
Johannes Uhlig,
Felix Neumärker,
Marc Berthel,
Marco Stolba,
Florian Kelber,
Leon Urbas,
Christian Mayr
Abstract:
Industrial plants suffer from a high degree of complexity and incompatibility in their communication infrastructure, caused by a wild mix of proprietary technologies. This prevents transformation towards Industry 4.0 and the Industrial Internet of Things. Open Platform Communications Unified Architecture (OPC UA) is a standardized protocol that addresses these problems with uniform and semantic co…
▽ More
Industrial plants suffer from a high degree of complexity and incompatibility in their communication infrastructure, caused by a wild mix of proprietary technologies. This prevents transformation towards Industry 4.0 and the Industrial Internet of Things. Open Platform Communications Unified Architecture (OPC UA) is a standardized protocol that addresses these problems with uniform and semantic communication across all levels of the hierarchy. However, its adoption in embedded field devices, such as sensors and actors, is still lacking due to prohibitive memory and power requirements of software implementations. We have developed a dedicated hardware engine that offloads processing of the OPC UA protocol and enables realization of compact and low-power field devices with OPC UA support. As part of a proof-of-concept embedded system we have implemented this engine in a 22 nm FDSOI technology. We measured performance, power consumption, and memory footprint of our test chip and compared it with a software implementation based on open62541 and a Raspberry Pi 2B. Our OPC UA hardware engine is 50 times more energy efficient and only requires 36 KiB of memory. The complete chip consumes only 24 mW under full load, making it suitable for low-power embedded applications.
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
The SpiNNaker 2 Processing Element Architecture for Hybrid Digital Neuromorphic Computing
Authors:
Sebastian Höppner,
Yexin Yan,
Andreas Dixius,
Stefan Scholze,
Johannes Partzsch,
Marco Stolba,
Florian Kelber,
Bernhard Vogginger,
Felix Neumärker,
Georg Ellguth,
Stephan Hartmann,
Stefan Schiefer,
Thomas Hocker,
Dennis Walter,
Genting Liu,
Jim Garside,
Steve Furber,
Christian Mayr
Abstract:
This paper introduces the processing element architecture of the second generation SpiNNaker chip, implemented in 22nm FDSOI. On circuit level, the chip features adaptive body biasing for near-threshold operation, and dynamic voltage-and-frequency scaling driven by spiking activity. On system level, processing is centered around an ARM M4 core, similar to the processor-centric architecture of the…
▽ More
This paper introduces the processing element architecture of the second generation SpiNNaker chip, implemented in 22nm FDSOI. On circuit level, the chip features adaptive body biasing for near-threshold operation, and dynamic voltage-and-frequency scaling driven by spiking activity. On system level, processing is centered around an ARM M4 core, similar to the processor-centric architecture of the first generation SpiNNaker. To speed operation of subtasks, we have added accelerators for numerical operations of both spiking (SNN) and rate based (deep) neural networks (DNN). PEs communicate via a dedicated, custom-designed network-on-chip. We present three benchmarks showing operation of the whole processor element on SNN, DNN and hybrid SNN/DNN networks.
△ Less
Submitted 15 August, 2022; v1 submitted 15 March, 2021;
originally announced March 2021.
-
Symbolic Loop Compilation for Tightly Coupled Processor Arrays
Authors:
Michael Witterauf,
Dominik Walter,
Frank Hannig,
Jürgen Teich
Abstract:
Loop compilation for Tightly Coupled Processor Arrays (TCPAs), a class of massively parallel loop accelerators, entails solving NP-hard problems, yet depends on the loop bounds and number of available processing elements (PEs), parameters known only at runtime because of dynamic resource management and input sizes. Therefore, this article proposes a two-phase approach called symbolic loop compilat…
▽ More
Loop compilation for Tightly Coupled Processor Arrays (TCPAs), a class of massively parallel loop accelerators, entails solving NP-hard problems, yet depends on the loop bounds and number of available processing elements (PEs), parameters known only at runtime because of dynamic resource management and input sizes. Therefore, this article proposes a two-phase approach called symbolic loop compilation: At compile time, the necessary NP-complete problems are solved and the solutions compiled into a space-efficient symbolic configuration. At runtime, a concrete configuration is generated from the symbolic configuration according to the parameters values. We show that the latter phase, called instantiation, runs in polynomial time with its most complex step, program instantiation, not depending on the number of PEs. As validation, we performed symbolic loop compilation on real-world loops and measured time and space requirements. Our experiments confirm that a symbolic configuration is space-efficient and suited for systems with little memory -- often, a symbolic configuration is smaller than a single concrete configuration -- and that program instantiation scales well with the number of PEs -- for example, when instantiating a symbolic configuration of a matrix-matrix multiplication, the execution time is similar for $4\times 4$ and $32\times 32$ PEs.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Evaluation of Output Embeddings for Fine-Grained Image Classification
Authors:
Zeynep Akata,
Scott Reed,
Daniel Walter,
Honglak Lee,
Bernt Schiele
Abstract:
Image classification has advanced significantly in recent years with the availability of large-scale image sets. However, fine-grained classification remains a major challenge due to the annotation cost of large numbers of fine-grained categories. This project shows that compelling classification performance can be achieved on such categories even without labeled training data. Given image and cla…
▽ More
Image classification has advanced significantly in recent years with the availability of large-scale image sets. However, fine-grained classification remains a major challenge due to the annotation cost of large numbers of fine-grained categories. This project shows that compelling classification performance can be achieved on such categories even without labeled training data. Given image and class embeddings, we learn a compatibility function such that matching embeddings are assigned a higher score than mismatching ones; zero-shot classification of an image proceeds by finding the label yielding the highest joint compatibility score. We use state-of-the-art image features and focus on different supervised attributes and unsupervised output embeddings either derived from hierarchies or learned from unlabeled text corpora. We establish a substantially improved state-of-the-art on the Animals with Attributes and Caltech-UCSD Birds datasets. Most encouragingly, we demonstrate that purely unsupervised output embeddings (learned from Wikipedia and improved with fine-grained text) achieve compelling results, even outperforming the previous supervised state-of-the-art. By combining different output embeddings, we further improve results.
△ Less
Submitted 28 August, 2015; v1 submitted 30 September, 2014;
originally announced September 2014.