Search | arXiv e-print repository

Optimal Low-Depth Quantum Signal-Processing Phase Estimation

Authors: Yulong Dong, Jonathan A. Gross, Murphy Yuezhen Niu

Abstract: Quantum effects like entanglement and coherent amplification can be used to drastically enhance the accuracy of quantum parameter estimation beyond classical limits. However, challenges such as decoherence and time-dependent errors hinder Heisenberg-limited amplification. We introduce Quantum Signal-Processing Phase Estimation algorithms that are robust against these challenges and achieve optimal… ▽ More Quantum effects like entanglement and coherent amplification can be used to drastically enhance the accuracy of quantum parameter estimation beyond classical limits. However, challenges such as decoherence and time-dependent errors hinder Heisenberg-limited amplification. We introduce Quantum Signal-Processing Phase Estimation algorithms that are robust against these challenges and achieve optimal performance as dictated by the Cramér-Rao bound. These algorithms use quantum signal transformation to decouple interdependent phase parameters into largely orthogonal ones, ensuring that time-dependent errors in one do not compromise the accuracy of learning the other. Combining provably optimal classical estimation with near-optimal quantum circuit design, our approach achieves an unprecedented standard deviation accuracy of $10^{-4}$ radians for estimating unwanted swap angles in superconducting two-qubit experiments, using low-depth ($<10$) circuits. This represents up to two orders of magnitude improvement over existing methods. Theoretically and numerically, we demonstrate the optimality of our algorithm against time-dependent phase errors, observing that the variance of the time-sensitive parameter $\varphi$ scales faster than the asymptotic Heisenberg scaling in the small-depth regime. Our results are rigorously validated against the quantum Fisher information, confirming our protocol's ability to achieve unmatched precision for two-qubit gate learning. △ Less

Submitted 17 June, 2024; originally announced July 2024.

Comments: 53 pages, 21 figures. arXiv admin note: substantial text overlap with arXiv:2209.11207

arXiv:2406.11779 [pdf, other]

Compact Proofs of Model Performance via Mechanistic Interpretability

Authors: Jason Gross, Rajashree Agrawal, Thomas Kwa, Euan Ong, Chun Hei Yip, Alex Gibson, Soufiane Noubir, Lawrence Chan

Abstract: In this work, we propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof… ▽ More In this work, we propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless noise as a key challenge for using mechanistic interpretability to generate compact proofs on model performance. △ Less

Submitted 29 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: accepted to 2024 ICML MI Workshop (Spotlight)

arXiv:2405.15637 [pdf]

Clearing the Path for Software Sustainability

Authors: Jennifer Gross, Sofia Ouhbi

Abstract: The advancement of software sustainability encounters notable challenges, underscoring the necessity for understanding these challenges to facilitate significant progress and pave the way for effective solutions to advance software sustainability. This paper outlines key challenges identified in literature based on findings from a tertiary study. Challenges identified include: confusion regarding… ▽ More The advancement of software sustainability encounters notable challenges, underscoring the necessity for understanding these challenges to facilitate significant progress and pave the way for effective solutions to advance software sustainability. This paper outlines key challenges identified in literature based on findings from a tertiary study. Challenges identified include: confusion regarding the definition of software sustainability, uncertainty about when to consider sustainability in software development, lack of assessment metrics and tools, narrow perspectives on sustainability in software systems, insufficient awareness and education, and a lack of serious considerations in practice. The paper aims at clarifying the confusion surrounding software sustainability to motivate effective solutions. The provided recommendations aim to give a more organized approach towards advancing sustainable software development, emphasizing comprehensive strategies, the integration of sustainability as a fundamental aspect of software development, actionable research directions, and the cultivation of a common understanding of sustainable software. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2404.03489 [pdf, other]

Design of Stickbug: a Six-Armed Precision Pollination Robot

Authors: Trevor Smith, Madhav Rijal, Christopher Tatsch, R. Michael Butts, Jared Beard, R. Tyler Cook, Andy Chu, Jason Gross, Yu Gu

Abstract: This work presents the design of Stickbug, a six-armed, multi-agent, precision pollination robot that combines the accuracy of single-agent systems with swarm parallelization in greenhouses. Precision pollination robots have often been proposed to offset the effects of a decreasing population of natural pollinators, but they frequently lack the required parallelization and scalability. Stickbug ac… ▽ More This work presents the design of Stickbug, a six-armed, multi-agent, precision pollination robot that combines the accuracy of single-agent systems with swarm parallelization in greenhouses. Precision pollination robots have often been proposed to offset the effects of a decreasing population of natural pollinators, but they frequently lack the required parallelization and scalability. Stickbug achieves this by allowing each arm and drive base to act as an individual agent, significantly reducing planning complexity. Stickbug uses a compact holonomic Kiwi drive to navigate narrow greenhouse rows, a tall mast to support multiple manipulators and reach plant heights, a detection model and classifier to identify Bramble flowers, and a felt-tipped end-effector for contact-based pollination. Initial experimental validation demonstrates that Stickbug can attempt over 1.5 pollinations per minute with a 50% success rate. Additionally, a Bramble flower perception dataset was created and is publicly available alongside Stickbug's software and design files. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 7 pages, 7 figures

arXiv:2401.09856 [pdf, other]

EDAF: An End-to-End Delay Analytics Framework for 5G-and-Beyond Networks

Authors: Samie Mostafavi, Marius Tillner, Gourav Prateek Sharma, James Gross

Abstract: Supporting applications in emerging domains like cyber-physical systems and human-in-the-loop scenarios typically requires adherence to strict end-to-end delay guarantees. Contributions of many tandem processes unfolding layer by layer within the wireless network result in violations of delay constraints, thereby severely degrading application performance. Meeting the application's stringent requi… ▽ More Supporting applications in emerging domains like cyber-physical systems and human-in-the-loop scenarios typically requires adherence to strict end-to-end delay guarantees. Contributions of many tandem processes unfolding layer by layer within the wireless network result in violations of delay constraints, thereby severely degrading application performance. Meeting the application's stringent requirements necessitates coordinated optimization of the end-to-end delay by fine-tuning all contributing processes. To achieve this task, we designed and implemented EDAF, a framework to decompose packets' end-to-end delays and determine each component's significance for 5G network. We showcase EDAF on OpenAirInterface 5G uplink, modified to report timestamps across the data plane. By applying the obtained insights, we optimized end-to-end uplink delay by eliminating segmentation and frame-alignment delays, decreasing average delay from 12ms to 4ms. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: Submitted to the 11th International Workshop on Computer and Networking Experimental Research using Testbeds (CNERT 2024)

arXiv:2312.04917 [pdf]

doi 10.1007/978-3-031-49266-2_10

Operationalizing Assurance Cases for Data Scientists: A Showcase of Concepts and Tooling in the Context of Test Data Quality for Machine Learning

Authors: Lisa Jöckel, Michael Kläs, Janek Groß, Pascal Gerber, Markus Scholz, Jonathan Eberle, Marc Teschner, Daniel Seifert, Richard Hawkins, John Molloy, Jens Ottnad

Abstract: Assurance Cases (ACs) are an established approach in safety engineering to argue quality claims in a structured way. In the context of quality assurance for Machine Learning (ML)-based software components, ACs are also being discussed and appear promising. Tools for operationalizing ACs do exist, yet mainly focus on supporting safety engineers on the system level. However, assuring the quality of… ▽ More Assurance Cases (ACs) are an established approach in safety engineering to argue quality claims in a structured way. In the context of quality assurance for Machine Learning (ML)-based software components, ACs are also being discussed and appear promising. Tools for operationalizing ACs do exist, yet mainly focus on supporting safety engineers on the system level. However, assuring the quality of an ML component within the system is commonly the responsibility of data scientists, who are usually less familiar with these tools. To address this gap, we propose a framework to support the operationalization of ACs for ML components based on technologies that data scientists use on a daily basis: Python and Jupyter Notebook. Our aim is to make the process of creating ML-related evidence in ACs more effective. Results from the application of the framework, documented through notebooks, can be integrated into existing AC tools. We illustrate the application of the framework on an example excerpt concerned with the quality of the test data. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: Accepted for publication at International Conference on Product-Focused Software Process Improvement (Profes 2023), https://conf.researchr.org/home/profes-2023

arXiv:2311.14982 [pdf, other]

Active Queue Management with Data-Driven Delay Violation Probability Predictors

Authors: Samie Mostafavi, Neelabhro Roy, György Dán, James Gross

Abstract: The increasing demand for latency-sensitive applications has necessitated the development of sophisticated algorithms that efficiently manage packets with end-to-end delay targets traversing the networked infrastructure. Network components must consider minimizing the packets' end-to-end delay violation probabilities (DVP) as a guiding principle throughout the transmission path to ensure timely de… ▽ More The increasing demand for latency-sensitive applications has necessitated the development of sophisticated algorithms that efficiently manage packets with end-to-end delay targets traversing the networked infrastructure. Network components must consider minimizing the packets' end-to-end delay violation probabilities (DVP) as a guiding principle throughout the transmission path to ensure timely deliveries. Active queue management (AQM) schemes are commonly used to mitigate congestion by drop** packets and controlling queuing delay. Today's established AQM schemes are threshold-driven, identifying congestion and trigger packet drop** using a predefined criteria which is unaware of packets' DVPs. In this work, we propose a novel framework, Delta, that combines end-to-end delay characterization with AQM for minimizing DVP. In a queuing theoretic environment, we show that such a policy is feasible by utilizing a data-driven approach to predict the queued packets' DVPs. That enables Delta AQM to effectively handle links with arbitrary stationary service time processes. The implementation is described in detail, and its performance is evaluated and compared with state of the art AQM algorithms. Our results show the Delta outperforms current AQM schemes substantially, in particular in scenarios where high reliability, i.e. high quantiles of the tail latency distribution, are of interest. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.01279 [pdf, other]

doi 10.1145/3583740.3626819

ExPECA: An Experimental Platform for Trustworthy Edge Computing Applications

Authors: Samie Mostafavi, Vishnu Narayanan Moothedath, Stefan Rönngren, Neelabhro Roy, Gourav Prateek Sharma, Sangwon Seo, Manuel Olguín Muñoz, James Gross

Abstract: This paper presents ExPECA, an edge computing and wireless communication research testbed designed to tackle two pressing challenges: comprehensive end-to-end experimentation and high levels of experimental reproducibility. Leveraging OpenStack-based Chameleon Infrastructure (CHI) framework for its proven flexibility and ease of operation, ExPECA is located in a unique, isolated underground facili… ▽ More This paper presents ExPECA, an edge computing and wireless communication research testbed designed to tackle two pressing challenges: comprehensive end-to-end experimentation and high levels of experimental reproducibility. Leveraging OpenStack-based Chameleon Infrastructure (CHI) framework for its proven flexibility and ease of operation, ExPECA is located in a unique, isolated underground facility, providing a highly controlled setting for wireless experiments. The testbed is engineered to facilitate integrated studies of both communication and computation, offering a diverse array of Software-Defined Radios (SDR) and Commercial Off-The-Shelf (COTS) wireless and wired links, as well as containerized computational environments. We exemplify the experimental possibilities of the testbed using OpenRTiST, a latency-sensitive, bandwidth-intensive application, and analyze its performance. Lastly, we highlight an array of research domains and experimental setups that stand to gain from ExPECA's features, including closed-loop applications and time-sensitive networking. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2307.10648 [pdf, other]

Data-Driven Latency Probability Prediction for Wireless Networks: Focusing on Tail Probabilities

Authors: Samie Mostafavi, Gourav Prateek Sharma, James Gross

Abstract: With the emergence of new application areas, such as cyber-physical systems and human-in-the-loop applications, there is a need to guarantee a certain level of end-to-end network latency with extremely high reliability, e.g., 99.999%. While mechanisms specified under IEEE 802.1as time-sensitive networking (TSN) can be used to achieve these requirements for switched Ethernet networks, implementing… ▽ More With the emergence of new application areas, such as cyber-physical systems and human-in-the-loop applications, there is a need to guarantee a certain level of end-to-end network latency with extremely high reliability, e.g., 99.999%. While mechanisms specified under IEEE 802.1as time-sensitive networking (TSN) can be used to achieve these requirements for switched Ethernet networks, implementing TSN mechanisms in wireless networks is challenging due to their stochastic nature. To conform the wireless link to a reliability level of 99.999%, the behavior of extremely rare outliers in the latency probability distribution, or the tail of the distribution, must be analyzed and controlled. This work proposes predicting the tail of the latency distribution using state-of-the-art data-driven approaches, such as mixture density networks (MDN) and extreme value mixture models, to estimate the likelihood of rare latencies conditioned on the network parameters, which can be used to make more informed decisions in wireless transmission. Actual latency measurements of IEEE 802.11g (WiFi), commercial private and a software-defined 5G network are used to benchmark the proposed approaches and evaluate their sensitivities concerning the tail probabilities. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: Submitted to IEEE Global Communications (GLOBECOM) 2023 conference

arXiv:2307.07365 [pdf, other]

Fully Coupled Forced Response Analysis of Nonlinear Turbine Blade Vibrations in the Frequency Domain

Authors: Christian Berthold, Johann Gross, Christian Frey, Malte Krack

Abstract: For the first time, a fully-coupled Harmonic Balance method is developed for the forced response of turbomachinery blades. The method is applied to a state-of-the-art model of a turbine bladed disk with interlocked shrouds subjected to wake-induced loading. The recurrent opening and closing of the pre-loaded shroud contact causes a softening effect, leading to turning points in the amplitude-frequ… ▽ More For the first time, a fully-coupled Harmonic Balance method is developed for the forced response of turbomachinery blades. The method is applied to a state-of-the-art model of a turbine bladed disk with interlocked shrouds subjected to wake-induced loading. The recurrent opening and closing of the pre-loaded shroud contact causes a softening effect, leading to turning points in the amplitude-frequency curve near resonance. Therefore, the coupled solver is embedded into a numerical path continuation framework. Two variants are developed: the coupled continuation of the solution path, and the coupled re-iteration of selected solution points. While the re-iteration variant is slightly more costly per solution point, it has the important advantage that it can be run completely in parallel, which substantially reduces the wall clock time. It is shown that wake- and vibration-induced flow fields do not linearly superimpose, leading to a severe underestimation of the resonant vibration level by the influence-coefficient-based state-of-the-art methods (which rely on this linearity assumption). △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: 24 pages, 14 figures, preprint submitted to Journal of Computers and Structures

arXiv:2307.07133 [pdf, other]

Step-GRAND: A Low Latency Universal Soft-input Decoder

Authors: Syed Mohsin Abbas, Marwan Jalaleddine, Chi-Ying Tsui, Warren J. Gross

Abstract: GRAND features both soft-input and hard-input variants that are well suited to efficient hardware implementations that can be characterized with achievable average and worst-case decoding latency. This paper introduces step-GRAND, a soft-input variant of GRAND that, in addition to achieving appealing average decoding latency, also reduces the worst-case decoding latency of the corresponding hardwa… ▽ More GRAND features both soft-input and hard-input variants that are well suited to efficient hardware implementations that can be characterized with achievable average and worst-case decoding latency. This paper introduces step-GRAND, a soft-input variant of GRAND that, in addition to achieving appealing average decoding latency, also reduces the worst-case decoding latency of the corresponding hardware implementation. The hardware implementation results demonstrate that the proposed step-GRAND can decode CA-polar code $(128,105+11)$ with an average information throughput of $47.7$ Gbps at the target FER of $\leq10^{-7}$. Furthermore, the proposed step-GRAND hardware is $10\times$ more area efficient than the previous soft-input ORBGRAND hardware implementation, and its worst-case latency is $\frac{1}{6.8}\times$ that of the previous ORBGRAND hardware. △ Less

Submitted 26 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: Submitted to 2023 IEEE Globecom Workshops

arXiv:2306.17703 [pdf, other]

doi 10.33012/navi.608

Evaluation of the Benefits of Zero Velocity Update in Decentralized EKF-Based Cooperative Localization Algorithms for GNSS-Denied Multi-Robot Systems

Authors: Cagri Kilic, Eduardo Gutierrez, Jason N. Gross

Abstract: This paper proposes the cooperative use of zero velocity update (ZU) in a decentralized extended Kalman filter (DEKF) based localization algorithm for multi-robot systems. The filter utilizes inertial measurement unit (IMU), ultra-wideband (UWB), and odometry velocity measurements to improve the localization performance of the system in the presence of a GNSS-denied environment. The contribution o… ▽ More This paper proposes the cooperative use of zero velocity update (ZU) in a decentralized extended Kalman filter (DEKF) based localization algorithm for multi-robot systems. The filter utilizes inertial measurement unit (IMU), ultra-wideband (UWB), and odometry velocity measurements to improve the localization performance of the system in the presence of a GNSS-denied environment. The contribution of this work is to evaluate the benefits of using ZU in a DEKF-based localization algorithm. The algorithm is tested with real hardware in a video motion capture facility and a Robot Operating System (ROS) based simulation environment for unmanned ground vehicles (UGV). Both simulation and real-world experiments are performed to show the effectiveness of using ZU in one robot to reinstate the localization of other robots in a multi-robot system. Experimental results from GNSS-denied simulation and real-world environments show that using ZU with simple heuristics in the DEKF significantly improves the 3D localization accuracy. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: 18 pages, preprint version, the manuscript is accepted for publication in NAVIGATION, the Journal of the Institute of Navigation. Submitted:10-11-2022, Revised: 21-04-2023, Accepted:23-06-2023

Journal ref: NAVIGATION: Journal of the Institute of Navigation December 2023, 70 (4) navi.608

arXiv:2305.19586 [pdf, other]

CryptOpt: Automatic Optimization of Straightline Code

Authors: Joel Kuepper, Andres Erbsen, Jason Gross, Owen Conoly, Chuyue Sun, Samuel Tian, David Wu, Adam Chlipala, Chitchanok Chuengsatiansup, Daniel Genkin, Markus Wagner, Yuval Yarom

Abstract: Manual engineering of high-performance implementations typically consumes many resources and requires in-depth knowledge of the hardware. Compilers try to address these problems; however, they are limited by design in what they can do. To address this, we present CryptOpt, an automatic optimizer for long stretches of straightline code. Experimental results across eight hardware platforms show that… ▽ More Manual engineering of high-performance implementations typically consumes many resources and requires in-depth knowledge of the hardware. Compilers try to address these problems; however, they are limited by design in what they can do. To address this, we present CryptOpt, an automatic optimizer for long stretches of straightline code. Experimental results across eight hardware platforms show that CryptOpt achieves a speed-up factor of up to 2.56 over current off-the-shelf compilers. △ Less

Submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.14872 [pdf]

Timeseries-aware Uncertainty Wrappers for Uncertainty Quantification of Information-Fusion-Enhanced AI Models based on Machine Learning

Authors: Janek Groß, Michael Kläs, Lisa Jöckel, Pascal Gerber

Abstract: As the use of Artificial Intelligence (AI) components in cyber-physical systems is becoming more common, the need for reliable system architectures arises. While data-driven models excel at perception tasks, model outcomes are usually not dependable enough for safety-critical applications. In this work,we present a timeseries-aware uncertainty wrapper for dependable uncertainty estimates on timese… ▽ More As the use of Artificial Intelligence (AI) components in cyber-physical systems is becoming more common, the need for reliable system architectures arises. While data-driven models excel at perception tasks, model outcomes are usually not dependable enough for safety-critical applications. In this work,we present a timeseries-aware uncertainty wrapper for dependable uncertainty estimates on timeseries data. The uncertainty wrapper is applied in combination with information fusion over successive model predictions in time. The application of the uncertainty wrapper is demonstrated with a traffic sign recognition use case. We show that it is possible to increase model accuracy through information fusion and additionally increase the quality of uncertainty estimates through timeseries-aware input quality features. △ Less

Submitted 31 May, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: 8 pages, 7 figures, VERDI workshop collocated with the DSN conference 2023

arXiv:2305.02521 [pdf, other]

Towards a Scalable Proof Engine: A Performant Prototype Rewriting Primitive for Coq

Authors: Jason Gross, Andres Erbsen, Jade Philipoom, Rajashree Agrawal, Adam Chlipala

Abstract: We address the challenges of scaling verification efforts to match the increasing complexity and size of systems. We propose a research agenda aimed at building a performant proof engine by studying the asymptotic performance of proof engines and redesigning their building blocks. As a case study, we explore equational rewriting and introduce a novel prototype proof engine building block for rewri… ▽ More We address the challenges of scaling verification efforts to match the increasing complexity and size of systems. We propose a research agenda aimed at building a performant proof engine by studying the asymptotic performance of proof engines and redesigning their building blocks. As a case study, we explore equational rewriting and introduce a novel prototype proof engine building block for rewriting in Coq, utilizing proof by reflection for enhanced performance. Our prototype implementation can significantly improve the development of verified compilers, as demonstrated in a case study with the Fiat Cryptography toolchain. The resulting extracted command-line compiler is about 1000$\times$ faster while featuring simpler compiler-specific proofs. This work lays some foundation for scaling verification efforts and contributes to the broader goal of develo** a proof engine with good asymptotic performance, ultimately aimed at enabling the verification of larger and more complex systems. △ Less

Submitted 10 June, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: Preprint of a submission under consideration for Selected Extended Papers of ITP 2022 in the Journal of Automated Reasoning. arXiv admin note: substantial text overlap with arXiv:2205.00862

arXiv:2304.11763 [pdf, other]

The Case for Hierarchical Deep Learning Inference at the Network Edge

Authors: Ghina Al-Atat, Andrea Fresa, Adarsh Prasad Behera, Vishnu Narayanan Moothedath, James Gross, Jaya Prakash Champati

Abstract: Resource-constrained Edge Devices (EDs), e.g., IoT sensors and microcontroller units, are expected to make intelligent decisions using Deep Learning (DL) inference at the edge of the network. Toward this end, there is a significant research effort in develo** tinyML models - Deep Learning (DL) models with reduced computation and memory storage requirements - that can be embedded on these devices… ▽ More Resource-constrained Edge Devices (EDs), e.g., IoT sensors and microcontroller units, are expected to make intelligent decisions using Deep Learning (DL) inference at the edge of the network. Toward this end, there is a significant research effort in develo** tinyML models - Deep Learning (DL) models with reduced computation and memory storage requirements - that can be embedded on these devices. However, tinyML models have lower inference accuracy. On a different front, DNN partitioning and inference offloading techniques were studied for distributed DL inference between EDs and Edge Servers (ESs). In this paper, we explore Hierarchical Inference (HI), a novel approach proposed by Vishnu et al. 2023, arXiv:2304.00891v1 , for performing distributed DL inference at the edge. Under HI, for each data sample, an ED first uses a local algorithm (e.g., a tinyML model) for inference. Depending on the application, if the inference provided by the local algorithm is incorrect or further assistance is required from large DL models on edge or cloud, only then the ED offloads the data sample. At the outset, HI seems infeasible as the ED, in general, cannot know if the local inference is sufficient or not. Nevertheless, we present the feasibility of implementing HI for machine fault detection and image classification applications. We demonstrate its benefits using quantitative analysis and argue that using HI will result in low latency, bandwidth savings, and energy savings in edge AI systems. △ Less

Submitted 23 April, 2023; originally announced April 2023.

Comments: This paper consists of 9 pages, with 6 tables and 8 figures

arXiv:2304.11207 [pdf, other]

SSS3D: Fast Neural Architecture Search For Efficient Three-Dimensional Semantic Segmentation

Authors: Olivier Therrien, Marihan Amein, Zhuoran Xiong, Warren J. Gross, Brett H. Meyer

Abstract: We present SSS3D, a fast multi-objective NAS framework designed to find computationally efficient 3D semantic scene segmentation networks. It uses RandLA-Net, an off-the-shelf point-based network, as a super-network to enable weight sharing and reduce search time by 99.67% for single-stage searches. SSS3D has a complex search space composed of sampling and architectural parameters that can form 2.… ▽ More We present SSS3D, a fast multi-objective NAS framework designed to find computationally efficient 3D semantic scene segmentation networks. It uses RandLA-Net, an off-the-shelf point-based network, as a super-network to enable weight sharing and reduce search time by 99.67% for single-stage searches. SSS3D has a complex search space composed of sampling and architectural parameters that can form 2.88 * 10^17 possible networks. To further reduce search time, SSS3D splits the complete search space and introduces a two-stage search that finds optimal subnetworks in 54% of the time required by single-stage searches. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: Accepted as a full paper by the TinyML Research Symposium 2023

arXiv:2304.01693 [pdf, other]

Performance of 802.11be Wi-Fi 7 with Multi-Link Operation on AR Applications

Authors: Molham Alsakati, Charlie Pettersson, Sebastian Max, Vishnu Narayanan Moothedath, James Gross

Abstract: Since its first release in the late 1990s, Wi-Fi has been updated to keep up with evolving user needs. Recently, Wi-Fi and other radio access technologies have been pushed to their edge when serving Augmented Reality (AR) applications. AR applications require high throughput, low latency, and high reliability to ensure a high-quality user experience. The 802.11be amendment, which will be marketed… ▽ More Since its first release in the late 1990s, Wi-Fi has been updated to keep up with evolving user needs. Recently, Wi-Fi and other radio access technologies have been pushed to their edge when serving Augmented Reality (AR) applications. AR applications require high throughput, low latency, and high reliability to ensure a high-quality user experience. The 802.11be amendment, which will be marketed as Wi-Fi 7, introduces several features that aim to enhance its capabilities to support challenging applications like AR. One of the main features introduced in this amendment is Multi-Link Operation (MLO) which allows nodes to transmit and receive over multiple links concurrently. When using MLO, traffic is distributed among links using an implementation-specific traffic-to-link allocation policy. This paper aims to evaluate the performance of MLO, using different policies, in serving AR applications compared to Single-Link (SL). Experimental simulations using an event-based Wi-Fi simulator have been conducted. Our results show the general superiority of MLO when serving AR applications. MLO achieves lower latency and serves a higher number of AR users compared to SL with the same frequency resources. In addition, increasing the number of links can improve the performance of MLO. Regarding traffic-to-link allocation policies, we found that policies can be more susceptible to channel blocking, resulting in possible performance degradation. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2304.01299 [pdf]

Towards Deterministic Communications in 6G Networks: State of the Art, Open Challenges and the Way Forward

Authors: Gourav Prateek Sharma, Dhruvin Patel, Joachim Sachs, Marilet De Andrade, Janos Farkas, Janos Harmatos, Balazs Varga, Hans-Peter Bernhard, Raheeb Muzaffar, Mahin K. Atiq, Frank Duerr, Dietmar Bruckner, Edgardo Montesdeoca, Drissa Houatra, Hongwei Zhang, James Gross

Abstract: Over the last decade, society and industries are undergoing rapid digitization that is expected to lead to the evolution of the cyber-physical continuum. End-to-end deterministic communications infrastructure is the essential glue that will bridge the digital and physical worlds of the continuum. We describe the state of the art and open challenges with respect to contemporary deterministic commun… ▽ More Over the last decade, society and industries are undergoing rapid digitization that is expected to lead to the evolution of the cyber-physical continuum. End-to-end deterministic communications infrastructure is the essential glue that will bridge the digital and physical worlds of the continuum. We describe the state of the art and open challenges with respect to contemporary deterministic communications and compute technologies: 3GPP 5G, IEEE Time-Sensitive Networking, IETF DetNet, OPC UA as well as edge computing. While these technologies represent significant technological advancements towards networking Cyber-Physical Systems (CPS), we argue in this paper that they rather represent a first generation of systems which are still limited in different dimensions. In contrast, realizing future deterministic communication systems requires, firstly, seamless convergence between these technologies and, secondly, scalability to support heterogeneous (time-varying requirements) arising from diverse CPS applications. In addition, future deterministic communication networks will have to provide such characteristics end-to-end, which for CPS refers to the entire communication and computation loop, from sensors to actuators. In this paper, we discuss the state of the art regarding the main challenges towards these goals: predictability, end-to-end technology integration, end-to-end security, and scalable vertical application interfacing. We then present our vision regarding viable approaches and technological enablers to overcome these four central challenges. Key approaches to leverage in that regard are 6G system evolutions, wireless friendly integration of 6G into TSN and DetNet, novel end-to-end security approaches, efficient edge-cloud integrations, data-driven approaches for stochastic characterization and prediction, as well as leveraging digital twins towards system awareness. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 22 pages, 8 figures

arXiv:2304.00891 [pdf, ps, other]

Online Algorithms for Hierarchical Inference in Deep Learning applications at the Edge

Authors: Vishnu Narayanan Moothedath, Jaya Prakash Champati, James Gross

Abstract: We consider a resource-constrained Edge Device (ED), such as an IoT sensor or a microcontroller unit, embedded with a small-size ML model (S-ML) for a generic classification application and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, bu… ▽ More We consider a resource-constrained Edge Device (ED), such as an IoT sensor or a microcontroller unit, embedded with a small-size ML model (S-ML) for a generic classification application and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. In order to get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We propose an online meta-learning framework that the ED can use to predict the correctness of the S-ML inference. In particular, we propose to use the maximum softmax value output by S-ML for a data sample and decide whether to offload it or not. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. We propose two different algorithms and prove sublinear regret bounds for them without any assumption on the smoothness of the loss function. We evaluate and benchmark the performance of the proposed algorithms for image classification application using four datasets, namely, Imagenette and Imagewoof, MNIST, and CIFAR-10. △ Less

Submitted 15 February, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: The original version was submitted to a journal and was later revised. The updated version was accepted in a journal and will be published soon. The 'Journal reference' will be updated as and when the information is available

arXiv:2303.16322 [pdf, other]

FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation

Authors: Zhuoran Xiong, Marihan Amein, Olivier Therrien, Warren J. Gross, Brett H. Meyer

Abstract: We present FMAS, a fast multi-objective neural architecture search framework for semantic segmentation. FMAS subsamples the structure and pre-trained parameters of DeepLabV3+, without fine-tuning, dramatically reducing training time during search. To further reduce candidate evaluation time, we use a subset of the validation dataset during the search. Only the final, Pareto non-dominated, candidat… ▽ More We present FMAS, a fast multi-objective neural architecture search framework for semantic segmentation. FMAS subsamples the structure and pre-trained parameters of DeepLabV3+, without fine-tuning, dramatically reducing training time during search. To further reduce candidate evaluation time, we use a subset of the validation dataset during the search. Only the final, Pareto non-dominated, candidates are ultimately fine-tuned using the complete training set. We evaluate FMAS by searching for models that effectively trade accuracy and computational cost on the PASCAL VOC 2012 dataset. FMAS finds competitive designs quickly, e.g., taking just 0.5 GPU days to discover a DeepLabV3+ variant that reduces FLOPs and parameters by 10$\%$ and 20$\%$ respectively, for less than 3$\%$ increased error. We also search on an edge device called GAP8 and use its latency as the metric. FMAS is capable of finding 2.2$\times$ faster network with 7.61$\%$ MIoU loss. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: Accepted as a full paper by the TinyML Research Symposium 2023

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was develo** infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2302.12454 [pdf, ps, other]

Stochastic Simulated Quantum Annealing for Fast Solution of Combinatorial Optimization Problems

Authors: Naoya Onizawa, Ryoma Sasaki, Duckgyu Shin, Warren J. Gross, Takahiro Hanyu

Abstract: In this paper, we introduce stochastic simulated quantum annealing (SSQA) for large-scale combinatorial optimization problems. SSQA is designed based on stochastic computing and quantum Monte Carlo, which can simulate quantum annealing (QA) by using multiple replicas of spins (probabilistic bits) in classical computing. The use of stochastic computing leads to an efficient parallel spin-state upda… ▽ More In this paper, we introduce stochastic simulated quantum annealing (SSQA) for large-scale combinatorial optimization problems. SSQA is designed based on stochastic computing and quantum Monte Carlo, which can simulate quantum annealing (QA) by using multiple replicas of spins (probabilistic bits) in classical computing. The use of stochastic computing leads to an efficient parallel spin-state update algorithm, enabling quick search for a solution around the global minimum energy. Therefore, SSQA realizes quantum-like annealing for large-scale problems and can handle fully connected models in combinatorial optimization, unlike QA. The proposed method is evaluated in MATLAB on graph isomorphism problems, which are typical combinatorial optimization problems. The proposed method achieves a convergence speed an order of magnitude faster than a conventional stochastic simulaated annealing method. Additionally, it can handle a 100-times larger problem size compared to QA and a 25-times larger problem size compared to a traditional SA method, respectively, for similar convergence probabilities. △ Less

Submitted 28 June, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

Comments: 14 pages, 8 figures

arXiv:2212.14486 [pdf, other]

Examining Political Rhetoric with Epistemic Stance Detection

Authors: Ankita Gupta, Su Lin Blodgett, Justin H Gross, Brendan O'Connor

Abstract: Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance predict… ▽ More Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies. △ Less

Submitted 5 January, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

Comments: Forthcoming in Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS) at EMNLP 2022

arXiv:2212.12965 [pdf, other]

BD-KD: Balancing the Divergences for Online Knowledge Distillation

Authors: Ibtihel Amara, Nazanin Sepahvand, Brett H. Meyer, Warren J. Gross, James J. Clark

Abstract: Knowledge distillation (KD) has gained a lot of attention in the field of model compression for edge devices thanks to its effectiveness in compressing large powerful networks into smaller lower-capacity models. Online distillation, in which both the teacher and the student are learning collaboratively, has also gained much interest due to its ability to improve on the performance of the networks… ▽ More Knowledge distillation (KD) has gained a lot of attention in the field of model compression for edge devices thanks to its effectiveness in compressing large powerful networks into smaller lower-capacity models. Online distillation, in which both the teacher and the student are learning collaboratively, has also gained much interest due to its ability to improve on the performance of the networks involved. The Kullback-Leibler (KL) divergence ensures the proper knowledge transfer between the teacher and student. However, most online KD techniques present some bottlenecks under the network capacity gap. By cooperatively and simultaneously training, the models the KL distance becomes incapable of properly minimizing the teacher's and student's distributions. Alongside accuracy, critical edge device applications are in need of well-calibrated compact networks. Confidence calibration provides a sensible way of getting trustworthy predictions. We propose BD-KD: Balancing of Divergences for online Knowledge Distillation. We show that adaptively balancing between the reverse and forward divergences shifts the focus of the training strategy to the compact student network without limiting the teacher network's learning process. We demonstrate that, by performing this balancing design at the level of the student distillation loss, we improve upon both performance accuracy and calibration of the compact student network. We conducted extensive experiments using a variety of network architectures and show improvements on multiple datasets including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approach through comprehensive comparisons and ablations with current state-of-the-art online and offline KD techniques. △ Less

Submitted 25 December, 2022; originally announced December 2022.

arXiv:2212.06100 [pdf, other]

Realistic Modeling of Human Timings for Wearable Cognitive Assistance

Authors: Manuel O. J. Olguín Muñoz, Vishnu N. Moothedath, Jaya Prakash Champati, Roberta Klatzky, Mahadev Satyanarayanan, James Gross

Abstract: Wearable Cognitive Assistance (WCA) applications present a challenge to benchmark and characterize due to their human-in-the-loop nature. Employing user testing to optimize system parameters is generally not feasible, given the scope of the problem and the number of observations needed to detect small but important effects in controlled experiments. Considering the intended mass-scale deployment o… ▽ More Wearable Cognitive Assistance (WCA) applications present a challenge to benchmark and characterize due to their human-in-the-loop nature. Employing user testing to optimize system parameters is generally not feasible, given the scope of the problem and the number of observations needed to detect small but important effects in controlled experiments. Considering the intended mass-scale deployment of WCA applications in the future, there exists a need for tools enabling human-independent benchmarking. We present in this paper the first model for the complete end-to-end emulation of humans in WCA. We build this model through statistical analysis of data collected from previous work in this field, and demonstrate its utility by studying application task durations. Compared to first-order approximations, our model shows a ~36% larger gap between step execution times at high system impairment versus low. We further introduce a novel framework for stochastic optimization of resource consumption-responsiveness tradeoffs in WCA, and show that by combining this framework with our realistic model of human behavior, significant reductions of up to 50% in number processed frame samples and 20% in energy consumption can be achieved with respect to the state-of-the-art. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: 16 total pages. 12 figures, 2 tables, 1 appendix. Main document body by Manuel Olguín Muñoz and Vishnu N. Moothedath; appendix by Vishu N. Moothedath and Jaya Prakash Champati; editing and feedback by all authors; funding by James Gross and Mahadev Satyanarayanan. Submitted to IEEE Transactions on Mobile Computing

arXiv:2211.17093 [pdf, other]

CutFEM forward modeling for EEG source analysis

Authors: Tim Erdbrügger, Andreas Westhoff, Malte Hoeltershinken, Jan-Ole Radecke, Yvonne Buschermoehle, Alena Buyx, Fabrice Wallois, Sampsa Pursiainen, Joachim Gross, Rebekka Lencer, Christian Engwer, Carsten Wolters

Abstract: Source analysis of Electroencephalography (EEG) data requires the computation of the scalp potential induced by current sources in the brain. This so-called EEG forward problem is based on an accurate estimation of the volume conduction effects in the human head, represented by a partial differential equation which can be solved using the finite element method (FEM). FEM offers flexibility when mo… ▽ More Source analysis of Electroencephalography (EEG) data requires the computation of the scalp potential induced by current sources in the brain. This so-called EEG forward problem is based on an accurate estimation of the volume conduction effects in the human head, represented by a partial differential equation which can be solved using the finite element method (FEM). FEM offers flexibility when modeling anisotropic tissue conductivities but requires a volumetric discretization, a mesh, of the head domain. Structured hexahedral meshes are easy to create in an automatic fashion, while tetrahedral meshes are better suited to model curved geometries. Tetrahedral meshes thus offer better accuracy, but are more difficult to create. Methods: We introduce CutFEM for EEG forward simulations to integrate the strengths of hexahedra and tetrahedra. It belongs to the family of unfitted finite element methods, decoupling mesh and geometry representation. Following a description of the method, we will employ CutFEM in both controlled spherical scenarios and the reconstruction of somatosensory evoked potentials. Results: CutFEM outperforms competing FEM approaches with regard to numerical accuracy, memory consumption and computational speed while being able to mesh arbitrarily touching compartments. Conclusion: CutFEM balances numerical accuracy, computational efficiency and a smooth approximation of complex geometries that has previously not been available in FEM-based EEG forward modeling. △ Less

Submitted 30 November, 2022; originally announced November 2022.

Comments: 9 pages, 8 figures

arXiv:2211.10665 [pdf, other]

CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)

Authors: Joel Kuepper, Andres Erbsen, Jason Gross, Owen Conoly, Chuyue Sun, Samuel Tian, David Wu, Adam Chlipala, Chitchanok Chuengsatiansup, Daniel Genkin, Markus Wagner, Yuval Yarom

Abstract: Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers).… ▽ More Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best-known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1 for the Intel $12^{th}$ and $13^{th}$ generations. △ Less

Submitted 21 May, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

arXiv:2209.11207 [pdf, other]

Beyond Heisenberg Limit Quantum Metrology through Quantum Signal Processing

Authors: Yulong Dong, Jonathan Gross, Murphy Yuezhen Niu

Abstract: Leveraging quantum effects in metrology such as entanglement and coherence allows one to measure parameters with enhanced sensitivity. However, time-dependent noise can disrupt such Heisenberg-limited amplification. We propose a quantum-metrology method based on the quantum-signal-processing framework to overcome these realistic noise-induced limitations in practical quantum metrology. Our algorit… ▽ More Leveraging quantum effects in metrology such as entanglement and coherence allows one to measure parameters with enhanced sensitivity. However, time-dependent noise can disrupt such Heisenberg-limited amplification. We propose a quantum-metrology method based on the quantum-signal-processing framework to overcome these realistic noise-induced limitations in practical quantum metrology. Our algorithm separates the gate parameter $\varphi$~(single-qubit Z phase) that is susceptible to time-dependent error from the target gate parameter $θ$~(swap-angle between |10> and |01> states) that is largely free of time-dependent error. Our method achieves an accuracy of $10^{-4}$ radians in standard deviation for learning $θ$ in superconducting-qubit experiments, outperforming existing alternative schemes by two orders of magnitude. We also demonstrate the increased robustness in learning time-dependent gate parameters through fast Fourier transformation and sequential phase difference. We show both theoretically and numerically that there is an interesting transition of the optimal metrology variance scaling as a function of circuit depth $d$ from the pre-asymptotic regime $d \ll 1/θ$ to Heisenberg limit $d \to \infty$. Remarkably, in the pre-asymptotic regime our method's estimation variance on time-sensitive parameter $\varphi$ scales faster than the asymptotic Heisenberg limit as a function of depth, $\text{Var}(\hat{\varphi})\approx 1/d^4$. Our work is the first quantum-signal-processing algorithm that demonstrates practical application in laboratory quantum computers. △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2208.02070 [pdf, other]

Efficient Fine-Tuning of Compressed Language Models with Learners

Authors: Danilo Vucetic, Mohammadreza Tayaranian, Maryam Ziaeefard, James J. Clark, Brett H. Meyer, Warren J. Gross

Abstract: Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the computational challenges of training to downstream tasks. We introduce Learner modules and priming, novel methods for fine-tuning that exploit the overparameterization of… ▽ More Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the computational challenges of training to downstream tasks. We introduce Learner modules and priming, novel methods for fine-tuning that exploit the overparameterization of pre-trained language models to gain benefits in convergence speed and resource utilization. Learner modules navigate the double bind of 1) training efficiently by fine-tuning a subset of parameters, and 2) training effectively by ensuring quick convergence and high metric scores. Our results on DistilBERT demonstrate that learners perform on par with or surpass the baselines. Learners train 7x fewer parameters than state-of-the-art methods on GLUE. On CoLA, learners fine-tune 20% faster, and have significantly lower resource utilization. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: 8 pages, 9 figures, 2 tables, presented at ICML 2022 workshop on Hardware-Aware Efficient Training (HAET 2022)

arXiv:2207.13629 [pdf, other]

doi 10.55417/fr.2022054

Proprioceptive Slip Detection for Planetary Rovers in Perceptually Degraded Extraterrestrial Environments

Authors: Cagri Kilic, Yu Gu, Jason N. Gross

Abstract: Slip detection is of fundamental importance for the safety and efficiency of rovers driving on the surface of extraterrestrial bodies. Current planetary rover slip detection systems rely on visual perception on the assumption that sufficient visual features can be acquired in the environment. However, visual-based methods are prone to suffer in perceptually degraded planetary environments with dom… ▽ More Slip detection is of fundamental importance for the safety and efficiency of rovers driving on the surface of extraterrestrial bodies. Current planetary rover slip detection systems rely on visual perception on the assumption that sufficient visual features can be acquired in the environment. However, visual-based methods are prone to suffer in perceptually degraded planetary environments with dominant low terrain features such as regolith, glacial terrain, salt-evaporites, and poor lighting conditions such as dark caves and permanently shadowed regions. Relying only on visual sensors for slip detection also requires additional computational power and reduces the rover traversal rate. This paper answers the question of how to detect wheel slippage of a planetary rover without depending on visual perception. In this respect, we propose a slip detection system that obtains its information from a proprioceptive localization framework that is capable of providing reliable, continuous, and computationally efficient state estimation over hundreds of meters. This is accomplished by using zero velocity update, zero angular rate update, and non-holonomic constraints as pseudo-measurement updates on an inertial navigation system framework. The proposed method is evaluated on actual hardware and field-tested in a planetary-analog environment. The method achieves greater than 92% slip detection accuracy for distances around 150 m using only an IMU and wheel encoders. △ Less

Submitted 29 July, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: 24 pages, 28 figures. Accepted for publication in Field Robotics

arXiv:2206.10305 [pdf, ps, other]

Analysis of Scale-Variant Robust Kernel Optimization for Non-linear Least Squares Problems

Authors: Shounak Das, Jason Gross

Abstract: In this article, we present a method for increasing adaptivity of an existing robust estimation algorithm by learning two parameters to better fit the residual distribution. The analyzed method uses these two parameters to calculate weights for Iterative Re-weighted Least Squares. This adaptive nature of the weights can be helpful in situations where the noise level varies in the measurements. We… ▽ More In this article, we present a method for increasing adaptivity of an existing robust estimation algorithm by learning two parameters to better fit the residual distribution. The analyzed method uses these two parameters to calculate weights for Iterative Re-weighted Least Squares. This adaptive nature of the weights can be helpful in situations where the noise level varies in the measurements. We test our algorithm first on the point cloud registration problem with synthetic data sets and LiDAR odometry with open source real-world data sets. We show that the existing approach needs an additional manual tuning of a residual scale parameter which our method directly learns from data and has similar or better performance. We further present the idea of decoupling scale and shape parameters to improve performance of the algorithm. We give detailed analysis of our algorithm along with its comparison with similar well-known algorithms from literature to show the benefits of the proposed approach. △ Less

Submitted 24 June, 2023; v1 submitted 7 May, 2022; originally announced June 2022.

Comments: Accepted for publication in IEEE Transactions on Aerospace and Electronic Systems

arXiv:2206.06838 [pdf]

Architectural patterns for handling runtime uncertainty of data-driven models in safety-critical perception

Authors: Janek Groß, Rasmus Adler, Michael Kläs, Jan Reich, Lisa Jöckel, Roman Gansch

Abstract: Data-driven models (DDM) based on machine learning and other AI techniques play an important role in the perception of increasingly autonomous systems. Due to the merely implicit definition of their behavior mainly based on the data used for training, DDM outputs are subject to uncertainty. This poses a challenge with respect to the realization of safety-critical perception tasks by means of DDMs.… ▽ More Data-driven models (DDM) based on machine learning and other AI techniques play an important role in the perception of increasingly autonomous systems. Due to the merely implicit definition of their behavior mainly based on the data used for training, DDM outputs are subject to uncertainty. This poses a challenge with respect to the realization of safety-critical perception tasks by means of DDMs. A promising approach to tackling this challenge is to estimate the uncertainty in the current situation during operation and adapt the system behavior accordingly. In previous work, we focused on runtime estimation of uncertainty and discussed approaches for handling uncertainty estimations. In this paper, we present additional architectural patterns for handling uncertainty. Furthermore, we evaluate the four patterns qualitatively and quantitatively with respect to safety and performance gains. For the quantitative evaluation, we consider a distance controller for vehicle platooning where performance gains are measured by considering how much the distance can be reduced in different operational situations. We conclude that the consideration of context information of the driving situation makes it possible to accept more or less uncertainty depending on the inherent risk of the situation, which results in performance gains. △ Less

Submitted 14 June, 2022; originally announced June 2022.

arXiv:2205.14247 [pdf, other]

Ainur: A Framework for Repeatable End-to-End Wireless Edge Computing Testbed Research

Authors: Manuel Olguín Muñoz, Seyed Samie Mostafavi, Vishnu N. Moothedath, James Gross

Abstract: Experimental research on wireless networking in combination with edge and cloud computing has been the subject of explosive interest in the last decade. This development has been driven by the increasing complexity of modern wireless technologies and the extensive softwarization of these through projects such as a Open Radio Access Network (O-RAN). In this context, a number of small- to mid-scale… ▽ More Experimental research on wireless networking in combination with edge and cloud computing has been the subject of explosive interest in the last decade. This development has been driven by the increasing complexity of modern wireless technologies and the extensive softwarization of these through projects such as a Open Radio Access Network (O-RAN). In this context, a number of small- to mid-scale testbeds have emerged, employing a variety of technologies to target a wide array of use-cases and scenarios in the context of novel mobile communication technologies such as 5G and beyond-5G. Little work, however, has yet been devoted to develo** a standard framework for wireless testbed automation which is hardware-agnostic and compatible with edge- and cloud-native technologies. Such a solution would simplify the development of new testbeds by completely or partially removing the requirement for custom management and orchestration software. In this paper, we present the first such mostly hardware-agnostic wireless testbed automation framework, Ainur. It is designed to configure, manage, orchestrate, and deploy workloads from an end-to-end perspective. Ainur is built on top of cloud-native technologies such as Docker, and is provided as FOSS to the community through the KTH-EXPECA/Ainur repository on GitHub. We demonstrate the utility of the platform with a series of scenarios, showcasing in particular its flexibility with respect to physical link definition, computation placement, and automation of arbitrarily complex experimental scenarios. △ Less

Submitted 31 May, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: 6 pages, 6 figures, demo session paper

arXiv:2205.01541 [pdf, other]

doi 10.1109/ISCAS48785.2022.9937567

Efficient Fine-Tuning of BERT Models on the Edge

Authors: Danilo Vucetic, Mohammadreza Tayaranian, Maryam Ziaeefard, James J. Clark, Brett H. Meyer, Warren J. Gross

Abstract: Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased reso… ▽ More Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased resource requirements, namely memory, computation, energy, and time. Furthermore, training is far more resource intensive than inference. Resource-constrained on-device learning is thus doubly difficult, especially with large BERT-like models. By reducing the memory usage of fine-tuning, pre-trained BERT models can become efficient enough to fine-tune on resource-constrained devices. We propose Freeze And Reconfigure (FAR), a memory-efficient training regime for BERT-like models that reduces the memory usage of activation maps during fine-tuning by avoiding unnecessary parameter updates. FAR reduces fine-tuning time on the DistilBERT model and CoLA dataset by 30%, and time spent on memory operations by 47%. More broadly, reductions in metric performance on the GLUE and SQuAD datasets are around 1% on average. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: 4 pages, 2 figures, 3 tables. To be published in ISCAS 2022 and made available on IEEE Xplore

arXiv:2205.00862 [pdf, other]

doi 10.4230/LIPIcs.ITP.2022.17

Accelerating Verified-Compiler Development with a Verified Rewriting Engine

Authors: Jason Gross, Andres Erbsen, Jade Philipoom, Miraya Poddar-Agrawal, Adam Chlipala

Abstract: Compilers are a prime target for formal verification, since compiler bugs invalidate higher-level correctness guarantees, but compiler changes may become more labor-intensive to implement, if they must come with proof patches. One appealing approach is to present compilers as sets of algebraic rewrite rules, which a generic engine can apply efficiently. Now each rewrite rule can be proved separate… ▽ More Compilers are a prime target for formal verification, since compiler bugs invalidate higher-level correctness guarantees, but compiler changes may become more labor-intensive to implement, if they must come with proof patches. One appealing approach is to present compilers as sets of algebraic rewrite rules, which a generic engine can apply efficiently. Now each rewrite rule can be proved separately, with no need to revisit past proofs for other parts of the compiler. We present the first realization of this idea, in the form of a framework for the Coq proof assistant. Our new Coq command takes normal proved theorems and combines them automatically into fast compilers with proofs. We applied our framework to improve the Fiat Cryptography toolchain for generating cryptographic arithmetic, producing an extracted command-line compiler that is about 1000$\times$ faster while actually featuring simpler compiler-specific proofs. △ Less

Submitted 18 July, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: 13th International Conference on Interactive Theorem Proving (ITP 2022)

ACM Class: F.3.1; D.2.4; F.4.2; D.3.4

arXiv:2205.00030 [pdf, other]

GRAND for Rayleigh Fading Channels

Authors: Syed Mohsin Abbas, Marwan Jalaleddine, Warren J. Gross

Abstract: Guessing Random Additive Noise Decoding (GRAND) is a code-agnostic decoding technique for short-length and high-rate channel codes. GRAND tries to guess the channel noise by generating test error patterns (TEPs), and the sequence of the TEPs is the main difference between different GRAND variants. In this work, we extend the application of GRAND to multipath frequency non-selective Rayleigh fading… ▽ More Guessing Random Additive Noise Decoding (GRAND) is a code-agnostic decoding technique for short-length and high-rate channel codes. GRAND tries to guess the channel noise by generating test error patterns (TEPs), and the sequence of the TEPs is the main difference between different GRAND variants. In this work, we extend the application of GRAND to multipath frequency non-selective Rayleigh fading communication channels, and we refer to this GRAND variant as Fading-GRAND. The proposed Fading-GRAND adapts its TEP generation to the fading conditions of the underlying communication channel, outperforming traditional channel code decoders in scenarios with $L$ spatial diversity branches as well as scenarios with no diversity. Numerical simulation results show that the Fading-GRAND outperforms the traditional Berlekamp-Massey (B-M) decoder for decoding BCH code $(127,106)$ and BCH code $(127,113)$ by $\mathbf{0.5\sim6.5}$ dB at a target FER of $10^{-7}$. Similarly, Fading-GRAND outperforms GRANDAB, the hard-input variation of GRAND, by $0.2\sim8$ dB at a target FER of $10^{-7}$ with CRC $(128,104)$ code and RLC $(128,104)$. Furthermore the average complexity of Fading-GRAND, at $\frac{E_b}{N_0}$ corresponding to target FER of $10^{-7}$, is $\frac{1}{2}\times\sim \frac{1}{46}\times$ the complexity of GRANDAB. △ Less

Submitted 30 November, 2022; v1 submitted 29 April, 2022; originally announced May 2022.

Comments: To appear in IEEE Global Communications Conference (GLOBECOM) 2022 Workshops

Journal ref: GLOBECOM 2022 Workshops

arXiv:2204.12758 [pdf, other]

Advantages of maintaining a multi-task project-specific bot: an experience report

Authors: Théo Zimmermann, Julien Coolen, Jason Gross, Pierre-Marie Pédrot, Gaëtan Gilbert

Abstract: Bots are becoming a popular method for automating basic everyday tasks in many software projects. This is true in particular because of the availability of many off-the-shelf task-specific bots that teams can quickly adopt (which are sometimes completed with additional task-specific custom bots). Based on our experience in the Coq project, where we have developed and maintained a multi-task projec… ▽ More Bots are becoming a popular method for automating basic everyday tasks in many software projects. This is true in particular because of the availability of many off-the-shelf task-specific bots that teams can quickly adopt (which are sometimes completed with additional task-specific custom bots). Based on our experience in the Coq project, where we have developed and maintained a multi-task project-specific bot, we argue that this alternative approach to project automation should receive more attention because it strikes a good balance between productivity and adaptibility. In this article, we describe the kind of automation that our bot implements, what advantages we have gained by maintaining a project-specific bot, and the technology and architecture choices that have made it possible. We draw conclusions that should generalize to other medium-sized software teams willing to invest in project automation without disrupting their workflows. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2112.07365

arXiv:2204.00118 [pdf, other]

Designing for emotion regulation interventions: an agenda for HCI theory and research

Authors: Petr Slovak, Alissa N. Antle, Nikki Theofanopoulou, Claudia Daudén Roquet, James J Gross, Katherine Isbister

Abstract: There is a growing interest in HCI to envision, design, and evaluate technology-enabled interventions that support users' emotion regulation. This interest stems in part from increased recognition that the ability to regulate emotions is critical to mental health, and that a lack of effective emotion regulation is a transdiagnostic factor for mental illness. However, the potential to combine innov… ▽ More There is a growing interest in HCI to envision, design, and evaluate technology-enabled interventions that support users' emotion regulation. This interest stems in part from increased recognition that the ability to regulate emotions is critical to mental health, and that a lack of effective emotion regulation is a transdiagnostic factor for mental illness. However, the potential to combine innovative HCI designs with the theoretical grounding and state-of-art interventions from psychology has yet to be fully realised. In this paper, we synthesise HCI work on emotion regulation interventions and propose a three-part framework to guide technology designers in making: (i) theory-informed decisions about intervention targets; (ii) strategic decisions regarding the technology-enabled intervention mechanisms to be included in the system; and (iii) practical decisions around previous implementations of the selected intervention components. We show how this framework can both systematise HCI work to date and suggest a research agenda for future work. △ Less

Submitted 4 April, 2022; v1 submitted 31 March, 2022; originally announced April 2022.

Comments: Currently under review

arXiv:2202.13823 [pdf, other]

Automatic Test-Case Reduction in Proof Assistants: A Case Study in Coq

Authors: Jason Gross, Théo Zimmermann, Miraya Poddar-Agrawal, Adam Chlipala

Abstract: As the adoption of proof assistants increases, there is a need for efficiency in identifying, documenting, and fixing compatibility issues that arise from proof assistant evolution. We present the Coq Bug Minimizer, a tool for reproducing buggy behavior with minimal and standalone files, integrated with coqbot to trigger automatically on Coq reverse CI failures. Our tool eliminates the overhead of… ▽ More As the adoption of proof assistants increases, there is a need for efficiency in identifying, documenting, and fixing compatibility issues that arise from proof assistant evolution. We present the Coq Bug Minimizer, a tool for reproducing buggy behavior with minimal and standalone files, integrated with coqbot to trigger automatically on Coq reverse CI failures. Our tool eliminates the overhead of having to download, set up, compile, and then explore and understand large developments: enabling Coq developers to easily obtain modular test-case files for fast experimentation. In this paper, we describe insights about how test-case reduction is different in Coq than in traditional compilers. We expect that our insights will generalize to other proof assistants. We evaluate the Coq Bug Minimizer on over 150 CI failures. Our tool succeeds in reducing failures to smaller test cases in roughly 75% of the time. The minimizer produces a fully standalone test case 89% of the time, and it is on average about one-third the size of the original test. The average reduced test case compiles in 1.25 seconds, with 75% taking under half a second. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2202.12422 [pdf, other]

Standard Deviation-Based Quantization for Deep Neural Networks

Authors: Amir Ardakani, Arash Ardakani, Brett Meyer, James J. Clark, Warren J. Gross

Abstract: Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resource-restricted devices. Inspired by existing methods, we propose a new framework to learn the quantization intervals (discrete values) using the knowledge of the network's weight and activation distributions, i.e., standard deviation. Furthermore, we propose… ▽ More Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resource-restricted devices. Inspired by existing methods, we propose a new framework to learn the quantization intervals (discrete values) using the knowledge of the network's weight and activation distributions, i.e., standard deviation. Furthermore, we propose a novel base-2 logarithmic quantization scheme to quantize weights to power-of-two discrete values. Our proposed scheme allows us to replace resource-hungry high-precision multipliers with simple shift-add operations. According to our evaluations, our method outperforms existing work on CIFAR10 and ImageNet datasets and even achieves better accuracy performance with 3-bit weights and activations when compared to the full-precision models. Moreover, our scheme simultaneously prunes the network's parameters and allows us to flexibly adjust the pruning ratio during the quantization process. △ Less

Submitted 24 February, 2022; originally announced February 2022.

arXiv:2201.00242 [pdf]

doi 10.1145/3453142.3493507

Industrial Edge-based Cyber-Physical Systems -- Application Needs and Concerns for Realization

Authors: Martin Törngren, Haydn Thompson, Erik Herzog, Rafia Inam, James Gross, György Dán

Abstract: Industry is moving towards advanced Cyber-Physical Systems (CPS), with trends in smartness, automation, connectivity and collaboration. We examine the drivers and requirements for the use of edge computing in critical industrial applications. Our purpose is to provide a better understanding of industrial needs and to initiate a discussion on what role edge computing could take, complementing curre… ▽ More Industry is moving towards advanced Cyber-Physical Systems (CPS), with trends in smartness, automation, connectivity and collaboration. We examine the drivers and requirements for the use of edge computing in critical industrial applications. Our purpose is to provide a better understanding of industrial needs and to initiate a discussion on what role edge computing could take, complementing current industrial and embedded systems, and the cloud. Four domains are chosen for analysis with representative use-cases; manufacturing, transportation, the energy sector and networked applications in the defense domain. We further discuss challenges, open issues and suggested directions that are needed to pave the way for the use of edge computing in industrial CPS. △ Less

Submitted 1 January, 2022; originally announced January 2022.

Comments: 7 pages, 1 figure

arXiv:2112.07872 [pdf, other]

doi 10.33012/2021.17938

A Comparison of Robust Kalman Filters for Improving Wheel-Inertial Odometry in Planetary Rovers

Authors: Shounak Das, Cagri Kilic, Ryan Watson, Jason Gross

Abstract: This paper compares the performance of adaptive and robust Kalman filter algorithms in improving wheel-inertial odometry on low featured rough terrain. Approaches include classical adaptive and robust methods as well as variational methods, which are evaluated experimentally on a wheeled rover in terrain similar to what would be encountered in planetary exploration. Variational filters show improv… ▽ More This paper compares the performance of adaptive and robust Kalman filter algorithms in improving wheel-inertial odometry on low featured rough terrain. Approaches include classical adaptive and robust methods as well as variational methods, which are evaluated experimentally on a wheeled rover in terrain similar to what would be encountered in planetary exploration. Variational filters show improved solution accuracy compared to the classical adaptive filters and are able to handle erroneous wheel odometry measurements and keep good localization for longer distances without significant drift. We also show how varying the parameters affects localization performance. △ Less

Submitted 14 December, 2021; originally announced December 2021.

arXiv:2112.07794 [pdf, other]

Review of Factor Graphs for Robust GNSS Applications

Authors: Shounak Das, Ryan Watson, Jason Gross

Abstract: Factor graphs have recently emerged as an alternative solution method for GNSS positioning. In this article, we review how factor graphs are implemented in GNSS, some of their advantages over Kalman Filters, and their importance in making positioning solutions more robust to degraded measurements. We also talk about how factor graphs can be an important tool for the field radio-navigation communit… ▽ More Factor graphs have recently emerged as an alternative solution method for GNSS positioning. In this article, we review how factor graphs are implemented in GNSS, some of their advantages over Kalman Filters, and their importance in making positioning solutions more robust to degraded measurements. We also talk about how factor graphs can be an important tool for the field radio-navigation community. △ Less

Submitted 15 June, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

arXiv:2112.07365 [pdf, other]

Extending the team with a project-specific bot

Authors: Théo Zimmermann, Julien Coolen, Jason Gross, Pierre-Marie Pédrot, Gaëtan Gilbert

Abstract: While every other software team is adopting off-the-shelf bots to automate everyday tasks, the Coq team has made a different choice by develo** and maintaining a project-specific bot from the ground up. In this article, we describe the reasons for this choice, what kind of automation this has allowed us to implement, how the many features of this custom bot have evolved based on internal feedbac… ▽ More While every other software team is adopting off-the-shelf bots to automate everyday tasks, the Coq team has made a different choice by develo** and maintaining a project-specific bot from the ground up. In this article, we describe the reasons for this choice, what kind of automation this has allowed us to implement, how the many features of this custom bot have evolved based on internal feedback, and the technology and architecture choices that have made it possible. △ Less

Submitted 14 December, 2021; originally announced December 2021.

arXiv:2112.07176 [pdf, other]

doi 10.33012/2021.18064

ZUPT Aided GNSS Factor Graph with Inertial Navigation Integration for Wheeled Robots

Authors: Cagri Kilic, Shounak Das, Eduardo Gutierrez, Ryan Watson, Jason Gross

Abstract: In this work, we demonstrate the importance of zero velocity information for global navigation satellite system (GNSS) based navigation. The effectiveness of using the zero velocity information with zero velocity update (ZUPT) for inertial navigation applications have been shown in the literature. Here we leverage this information and add it as a position constraint in a GNSS factor graph. We also… ▽ More In this work, we demonstrate the importance of zero velocity information for global navigation satellite system (GNSS) based navigation. The effectiveness of using the zero velocity information with zero velocity update (ZUPT) for inertial navigation applications have been shown in the literature. Here we leverage this information and add it as a position constraint in a GNSS factor graph. We also compare its performance to a GNSS/inertial navigation system (INS) coupled factor graph. We tested our ZUPT aided factor graph method on three datasets and compared it with the GNSS-only factor graph. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: 9 pages, 8 figures, Preprint Version. Published in ION GNSS+ 2021

arXiv:2111.07693 [pdf, other]

doi 10.1016/j.compstruc.2021.106698

A massless boundary component mode synthesis method for elastodynamic contact problems

Authors: Carlo Monjaraz-Tec, Johann Gross, Malte Krack

Abstract: We propose to combine the ideas of mass redistribution and component mode synthesis. More specifically, we employ the MacNeal method, which readily leads to a singular mass matrix, and an accordingly modified version of the Craig-Bampton method. Besides obtaining a massless boundary, we achieve a drastic reduction of the mathematical model order in this way compared to the parent finite element mo… ▽ More We propose to combine the ideas of mass redistribution and component mode synthesis. More specifically, we employ the MacNeal method, which readily leads to a singular mass matrix, and an accordingly modified version of the Craig-Bampton method. Besides obtaining a massless boundary, we achieve a drastic reduction of the mathematical model order in this way compared to the parent finite element model. Contact is modeled using set-valued laws and time step** is carried out with a semi-explicit scheme. We assess the method's computational performance by a series of benchmarks, including both frictionless and frictional contact. The results indicate that the proposed method achieves excellent energy conservation properties and superior convergence behavior. It reduces the spurious oscillations and decreases the computational effort by about 1-2 orders of magnitude compared to the current state of the art (mass-carrying component mode synthesis method). We believe that the computational performance and favorable energy conservation properties will be valuable for the prediction of vibro-impact processes and physical dam**. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: The final version of this article is available online at https://doi.org/10.1016/j.compstruc.2021.106698

arXiv:2110.13776 [pdf, other]

doi 10.1109/TVLSI.2022.3153605

High-Throughput and Energy-Efficient VLSI Architecture for Ordered Reliability Bits GRAND

Authors: Syed Mohsin Abbas, Thibaud Tonnellier, Furkan Ercan, Marwan Jalaleddine, Warren J. Gross

Abstract: Ultra-reliable low-latency communication (URLLC), a major 5G New-Radio use case, is the key enabler for applications with strict reliability and latency requirements. These applications necessitate the use of short-length and high-rate codes. Guessing Random Additive Noise Decoding (GRAND) is a recently proposed Maximum Likelihood (ML) decoding technique for these short-length and high-rate codes.… ▽ More Ultra-reliable low-latency communication (URLLC), a major 5G New-Radio use case, is the key enabler for applications with strict reliability and latency requirements. These applications necessitate the use of short-length and high-rate codes. Guessing Random Additive Noise Decoding (GRAND) is a recently proposed Maximum Likelihood (ML) decoding technique for these short-length and high-rate codes. Rather than decoding the received vector, GRAND tries to infer the noise that corrupted the transmitted codeword during transmission through the communication channel. As a result, GRAND can decode any code, structured or unstructured. GRAND has hard-input as well as soft-input variants. Among these variants, Ordered Reliability Bits GRAND (ORBGRAND) is a soft-input variant that outperforms hard-input GRAND and is suitable for parallel hardware implementation. This work reports the first hardware architecture for ORBGRAND, which achieves an average throughput of up to $42.5$ Gbps for a code length of $128$ at a target FER of $10^{-7}$. Furthermore, the proposed hardware can be used to decode any code as long as the length and rate constraints are met. In comparison to the GRANDAB, a hard-input variant of GRAND, the proposed architecture enhances decoding performance by at least $2$ dB. When compared to the state-of-the-art fast dynamic successive cancellation flip decoder (Fast-DSCF) using a 5G polar $(128,105)$ code, the proposed ORBGRAND VLSI implementation has $49\times$ higher average throughput, $32\times$ times more energy efficiency, and $5\times$ more area efficiency while maintaining similar decoding performance. △ Less

Submitted 11 March, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted for inclusion in IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 2022. For the updated version, please see IEEE Xplore

Journal ref: IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2022

arXiv:2109.12239 [pdf, other]

doi 10.1109/ACCESS.2021.3140151

Fast Successive-Cancellation List Flip Decoding of Polar Codes

Authors: Nghia Doan, Seyyed Ali Hashemi, Warren J. Gross

Abstract: This work presents a fast successive-cancellation list flip (Fast-SCLF) decoding algorithm for polar codes that addresses the high latency issue associated with the successive-cancellation list flip (SCLF) decoding algorithm. We first propose a bit-flip** strategy tailored to the state-of-the-art fast successive-cancellation list (FSCL) decoding that avoids tree-traversal in the binary tree repr… ▽ More This work presents a fast successive-cancellation list flip (Fast-SCLF) decoding algorithm for polar codes that addresses the high latency issue associated with the successive-cancellation list flip (SCLF) decoding algorithm. We first propose a bit-flip** strategy tailored to the state-of-the-art fast successive-cancellation list (FSCL) decoding that avoids tree-traversal in the binary tree representation of SCLF, thus reducing the latency of the decoding process. We then derive a parameterized path selection error model to accurately estimate the bit index at which the correct decoding path is eliminated from the initial FSCL decoding. The trainable parameter is optimized online based on an efficient supervised learning framework. Simulation results show that for a polar code of length 512 with 256 information bits, with similar error-correction performance and memory consumption, the proposed Fast-SCLF decoder reduces up to $73.4\%$ of the average decoding latency of the SCLF decoder with the same list size at the frame error rate of $10^{-4}$, while incurring a maximum computational complexity overhead of $27.6\%$. For the same polar code of length 512 with 256 information bits and at practical signal-to-noise ratios, the proposed decoder with list size 4 reduces $89.3\%$ and $43.7\%$ of the average complexity and decoding latency of the FSCL decoder with list size 32 (FSCL-32), respectively, while also reducing $83.2\%$ of the memory consumption of FSCL-32. The significant improvements of the proposed decoder come at the cost of $0.07$ dB error-correction performance degradation compared with FSCL-32. △ Less

Submitted 23 January, 2022; v1 submitted 24 September, 2021; originally announced September 2021.

Comments: Published in IEEE Access, Volume: 10, Page(s): 5568 - 5584, Date of Publication: 04 January 2022

arXiv:2109.12225 [pdf, other]

doi 10.1109/TVLSI.2022.3223692

List-GRAND: A practical way to achieve Maximum Likelihood Decoding

Authors: Syed Mohsin Abbas, Marwan Jalaleddine, Warren J. Gross

Abstract: Guessing Random Additive Noise Decoding (GRAND) is a recently proposed universal Maximum Likelihood (ML) decoder for short-length and high-rate linear block-codes. Soft-GRAND (SGRAND) is a prominent soft-input GRAND variant, outperforming the other GRAND variants in decoding performance; nevertheless, SGRAND is not suitable for parallel hardware implementation. Ordered Reliability Bits-GRAND (ORBG… ▽ More Guessing Random Additive Noise Decoding (GRAND) is a recently proposed universal Maximum Likelihood (ML) decoder for short-length and high-rate linear block-codes. Soft-GRAND (SGRAND) is a prominent soft-input GRAND variant, outperforming the other GRAND variants in decoding performance; nevertheless, SGRAND is not suitable for parallel hardware implementation. Ordered Reliability Bits-GRAND (ORBGRAND) is another soft-input GRAND variant that is suitable for parallel hardware implementation, however it has lower decoding performance than SGRAND. In this paper, we propose List-GRAND (LGRAND), a technique for enhancing the decoding performance of ORBGRAND to match the ML decoding performance of SGRAND. Numerical simulation results show that LGRAND enhances ORBGRAND's decoding performance by $0.5-0.75$ dB for channel-codes of various classes at a target FER of $10^{-7}$. For linear block codes of length $127/128$ and different code-rates, LGRAND's VLSI implementation can achieve an average information throughput of $47.27-51.36$ Gbps. In comparison to ORBGRAND's VLSI implementation, the proposed LGRAND hardware has a $4.84\%$ area overhead. △ Less

Submitted 2 December, 2022; v1 submitted 24 September, 2021; originally announced September 2021.

Comments: This article has been accepted for publication in IEEE Transactions on Very Large Scale Integration (VLSI) Systems. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/TVLSI.2022.3223692

Journal ref: IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2022

Showing 1–50 of 174 results for author: Groß, J