Search | arXiv e-print repository

arXiv:2404.19264 [pdf, other]

DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

Authors: Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Sophia Shao, Borivoje Nikolic, Koushil Sreenath

Abstract: This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged rob… ▽ More This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged robot locomotion, especially with multiple skills in a single policy, presents significant challenges for prior online reinforcement learning methods. To address this challenge, we propose a novel, scalable framework that leverages diffusion models to directly learn from offline multimodal datasets with a diverse set of locomotion skills. With design choices tailored for real-time control in dynamical systems, including receding horizon control and delayed inputs, DiffuseLoco is capable of reproducing multimodality in performing various locomotion skills, zero-shot transfer to real quadrupedal robots, and it can be deployed on edge computing devices. Furthermore, DiffuseLoco demonstrates free transitions between skills and robustness against environmental variations. Through extensive benchmarking in real-world experiments, DiffuseLoco exhibits better stability and velocity tracking performance compared to prior reinforcement learning and non-diffusion-based behavior cloning baselines. The design choices are validated via comprehensive ablation studies. This work opens new possibilities for scaling up learning-based legged locomotion controllers through the scaling of large, expressive models and diverse offline datasets. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2402.14892 [pdf, other]

Novelty Detection on Radio Astronomy Data using Signatures

Authors: Paola Arrubarrena, Maud Lemercier, Bojan Nikolic, Terry Lyons, Thomas Cass

Abstract: We introduce SigNova, a new semi-supervised framework for detecting anomalies in streamed data. While our initial examples focus on detecting radio-frequency interference (RFI) in digitized signals within the field of radio astronomy, it is important to note that SigNova's applicability extends to any type of streamed data. The framework comprises three primary components. Firstly, we use the sign… ▽ More We introduce SigNova, a new semi-supervised framework for detecting anomalies in streamed data. While our initial examples focus on detecting radio-frequency interference (RFI) in digitized signals within the field of radio astronomy, it is important to note that SigNova's applicability extends to any type of streamed data. The framework comprises three primary components. Firstly, we use the signature transform to extract a canonical collection of summary statistics from observational sequences. This allows us to represent variable-length visibility samples as finite-dimensional feature vectors. Secondly, each feature vector is assigned a novelty score, calculated as the Mahalanobis distance to its nearest neighbor in an RFI-free training set. By thresholding these scores we identify observation ranges that deviate from the expected behavior of RFI-free visibility samples without relying on stringent distributional assumptions. Thirdly, we integrate this anomaly detector with Pysegments, a segmentation algorithm, to localize consecutive observations contaminated with RFI, if any. This approach provides a compelling alternative to classical windowing techniques commonly used for RFI detection. Importantly, the complexity of our algorithm depends on the RFI pattern rather than on the size of the observation window. We demonstrate how SigNova improves the detection of various types of RFI (e.g., broadband and narrowband) in time-frequency visibility data. We validate our framework on the Murchison Widefield Array (MWA) telescope and simulated data and the Hydrogen Epoch of Reionization Array (HERA). △ Less

Submitted 12 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

MSC Class: 60L10; 60L20

arXiv:2305.05843 [pdf, other]

doi 10.1109/HPCA56546.2023.10071035

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks

Authors: Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanović, Borivoje Nikolić, Yakun Sophia Shao

Abstract: Driven by the wide adoption of deep neural networks (DNNs) across different application domains, multi-tenancy execution, where multiple DNNs are deployed simultaneously on the same hardware, has been proposed to satisfy the latency requirements of different applications while improving the overall system utilization. However, multi-tenancy execution could lead to undesired system-level resource c… ▽ More Driven by the wide adoption of deep neural networks (DNNs) across different application domains, multi-tenancy execution, where multiple DNNs are deployed simultaneously on the same hardware, has been proposed to satisfy the latency requirements of different applications while improving the overall system utilization. However, multi-tenancy execution could lead to undesired system-level resource contention, causing quality-of-service (QoS) degradation for latency-critical applications. To address this challenge, we propose MoCA, an adaptive multi-tenancy system for DNN accelerators. Unlike existing solutions that focus on compute resource partition, MoCA dynamically manages shared memory resources of co-located applications to meet their QoS targets. Specifically, MoCA leverages the regularities in both DNN operators and accelerators to dynamically modulate memory access rates based on their latency targets and user-defined priorities so that co-located applications get the resources they demand without significantly starving their co-runners. We demonstrate that MoCA improves the satisfaction rate of the service level agreement (SLA) up to 3.9x (1.8x average), system throughput by 2.3x (1.7x average), and fairness by 1.3x (1.2x average), compared to prior work. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 2023 HPCA, Reproducibility Badges (Open Research Objects, Research Objects Reviewed, Results Reproduced)

arXiv:1911.09925 [pdf, other]

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration

Authors: Hasan Genc, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert Ou, Colin Schmidt, Samuel Steffl, John Wright, Ion Stoica, Jonathan Ragan-Kelley, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao

Abstract: DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of System-on-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source*,… ▽ More DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of System-on-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source*, full-stack DNN accelerator generator. Gemmini generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects. Gemmini-generated accelerators have also been fabricated, delivering up to three orders-of-magnitude speedups over high-performance CPUs on various DNN benchmarks. * https://github.com/ucb-bar/gemmini △ Less

Submitted 9 July, 2021; v1 submitted 22 November, 2019; originally announced November 2019.

Comments: To appear at the 58th IEEE/ACM Design Automation Conference (DAC), December 2021, San Francisco, CA, USA

arXiv:1806.08777 [pdf, other]

Wireless Channel Dynamics and Robustness for Ultra-Reliable Low-Latency Communications

Authors: Vasuki Narasimha Swamy, Paul Rigge, Gireeja Ranade, Borivoje Nikolic, Anant Sahai

Abstract: Interactive, immersive and critical applications demand ultra-reliable low-latency communication (URLLC). To build wireless communication systems that can support these applications, understanding the characteristics of the wireless medium is paramount. Although wireless channel characteristics and dynamics have been extensively studied, it is important to revisit these concepts in the context of… ▽ More Interactive, immersive and critical applications demand ultra-reliable low-latency communication (URLLC). To build wireless communication systems that can support these applications, understanding the characteristics of the wireless medium is paramount. Although wireless channel characteristics and dynamics have been extensively studied, it is important to revisit these concepts in the context of the strict demands of low latency and ultra-reliability. In this paper, we bring a modeling approach from robust control to wireless communication -- the wireless channel characteristics are given a nominal model around which we allow for some quantified uncertainty. We propose certain key "directions" along which to bound model uncertainty that are relevant to URLLC. For the nominal model, we take an in-depth look at wireless channel characteristics such as spatial and temporal correlations based on Jakes' model. Contrary to what has been claimed in the literature, we find that standard Rayleigh fading processes are not bandlimited. This has significant implications on the predictability of channels. We also find that under reasonable conditions the spatial correlation of channels provide a fading distribution that is not too far off from an independent spatial fading model. Additionally, we look at the impact of these channel models on cooperative communication based systems. We find that while spatial-diversity-based techniques are necessary to combat the adverse effects of fading, time-diversity-based techniques are necessary to be robust against unmodeled errors. Robust URLLC systems need to operate with both an SNR margin and a time/repetition margin. △ Less

Submitted 22 June, 2018; originally announced June 2018.

Comments: Submitted to IEEE JSAC Special Issue on Ultra-Reliable Low-Latency Communications in Wireless Networks

arXiv:1803.05143 [pdf, other]

Network Coding for Real-time Wireless Communication for Automation

Authors: Vasuki Narasimha Swamy, Paul Rigge, Gireeja Ranade, Anant Sahai, Borivoje Nikolic

Abstract: Real-time applications require latencies on the order of a millisecond with very high reliabilities, paralleling the requirements for high-performance industrial control. Current wireless technologies like WiFi, Bluetooth, LTE, etc. are unable to meet these stringent latency and reliability requirements, forcing the use of wired systems. This paper introduces a wireless communication protocol base… ▽ More Real-time applications require latencies on the order of a millisecond with very high reliabilities, paralleling the requirements for high-performance industrial control. Current wireless technologies like WiFi, Bluetooth, LTE, etc. are unable to meet these stringent latency and reliability requirements, forcing the use of wired systems. This paper introduces a wireless communication protocol based on network coding that in conjunction with cooperative communication techniques builds the necessary diversity to achieve the target reliability. The proposed protocol is analyzed using a communication theoretic delay-limited-capacity framework and compared to proposed protocols without network coding. The results show that for larger network sizes or payloads employing network coding lowers the minimum SNR required to achieve the target reliability. For a scenario inspired by an industrial printing application with $30$ nodes in the control loop, aggregate throughput of $4.8$ Mb/s, $20$MHz of bandwidth and cycle time under $2$ ms, the protocol can robustly achieve a system probability of error better than $10^{-9}$ with a nominal SNR less than $2$ dB under ideal channel conditions. △ Less

Submitted 14 March, 2018; originally announced March 2018.

Comments: A preliminary version of this work appeared at IEEE WCNC 2016

arXiv:1609.02968 [pdf, other]

Real-time Cooperative Communication for Automation over Wireless

Authors: Vasuki Narasimha Swamy, Sahaana Suri, Paul Rigge, Matthew Weiner, Gireeja Ranade, Anant Sahai, Borivoje Nikolic

Abstract: High-performance industrial automation systems rely on tens of simultaneously active sensors and actuators and have stringent communication latency and reliability requirements. Current wireless technologies like WiFi, Bluetooth, and LTE are unable to meet these requirements, forcing the use of wired communication in industrial control systems. This paper introduces a wireless communication protoc… ▽ More High-performance industrial automation systems rely on tens of simultaneously active sensors and actuators and have stringent communication latency and reliability requirements. Current wireless technologies like WiFi, Bluetooth, and LTE are unable to meet these requirements, forcing the use of wired communication in industrial control systems. This paper introduces a wireless communication protocol that capitalizes on multiuser diversity and cooperative communication to achieve the ultra-reliability with a low-latency constraint. Our protocol is analyzed using the communication-theoretic delay-limited-capacity framework and compared to baseline schemes that primarily exploit frequency diversity. For a scenario inspired by an industrial printing application with thirty nodes in the control loop, 20B messages transmitted between pairs of nodes and a cycle time of $2$ ms, an idealized protocol can achieve a cycle failure probability (probability that any packet in a cycle is not successfully delivered) lower than $10^{-9}$ with nominal SNR below 5 dB in a 20MHz wide channel. △ Less

Submitted 23 January, 2017; v1 submitted 9 September, 2016; originally announced September 2016.

Comments: A preliminary version of this work appeared at IEEE International Conference on Communications 2015

arXiv:1606.02942 [pdf, other]

Analysis of buffering effects on hard real-time priority-preemptive wormhole networks

Authors: Leandro Soares Indrusiak, Alan Burns, Borislav Nikolic

Abstract: There are several approaches to analyse the worst-case response times of sporadic packets transmitted over priority-preemptive wormhole networks. In this paper, we provide an overview of the different approaches, discuss their strengths and weaknesses, and propose an approach that captures all effects considered by previous approaches while providing tight yet safe upper bounds for packet response… ▽ More There are several approaches to analyse the worst-case response times of sporadic packets transmitted over priority-preemptive wormhole networks. In this paper, we provide an overview of the different approaches, discuss their strengths and weaknesses, and propose an approach that captures all effects considered by previous approaches while providing tight yet safe upper bounds for packet response times. We specifically address the problems created by buffering and backpressure in wormhole networks, which amplifies the problem of indirect interference in a way that has not been considered by the early analysis approaches. Didactic examples and large-scale experiments with synthetically generated packet flow sets provide evidence of the strength of the proposed approach. △ Less

Submitted 9 June, 2016; originally announced June 2016.

arXiv:1605.07888 [pdf, other]

A Tighter Real-Time Communication Analysis for Wormhole-Switched Priority-Preemptive NoCs

Authors: Borislav Nikolic, Leandro Soares Indrusiak, Stefan M. Petters

Abstract: Simulations and runtime measurements are some of the methods which can be used to evaluate whether a given NoC-based platform can accommodate application workload and fulfil its timing requirements. Yet, these techniques are often time-consuming, and hence can evaluate only a limited set of scenarios. Therefore, these approaches are not suitable for safety-critical and hard real-time systems, wher… ▽ More Simulations and runtime measurements are some of the methods which can be used to evaluate whether a given NoC-based platform can accommodate application workload and fulfil its timing requirements. Yet, these techniques are often time-consuming, and hence can evaluate only a limited set of scenarios. Therefore, these approaches are not suitable for safety-critical and hard real-time systems, where one of the fundamental requirements is to provide strong guarantees that all timing requirements will always be met, even in the worst-case conditions. For such systems the analytic-based real-time analysis is the only viable approach. In this paper the focus is on the real-time communication analysis for wormhole-switched priority-preemptive NoCs. First, we elaborate on the existing analysis and identify one source of pessimism. Then, we propose an extension to the analysis, which efficiently overcomes this limitation, and allows for a less pessimistic analysis. Finally, through a comprehensive experimental evaluation, we compare the newly proposed approach against the existing one, and also observe how the trends change with different traffic parameters. △ Less

Submitted 25 May, 2016; originally announced May 2016.

arXiv:1410.6121 [pdf, other]

doi 10.1142/S0217979214300217

The Nonequilibrium Many-Body Problem as a paradigm for extreme data science

Authors: J. K. Freericks, B. K. Nikolic, O. Frieder

Abstract: Generating big data pervades much of physics. But some problems, which we call extreme data problems, are too large to be treated within big data science. The nonequilibrium quantum many-body problem on a lattice is just such a problem, where the Hilbert space grows exponentially with system size and rapidly becomes too large to fit on any computer (and can be effectively thought of as an infinite… ▽ More Generating big data pervades much of physics. But some problems, which we call extreme data problems, are too large to be treated within big data science. The nonequilibrium quantum many-body problem on a lattice is just such a problem, where the Hilbert space grows exponentially with system size and rapidly becomes too large to fit on any computer (and can be effectively thought of as an infinite-sized data set). Nevertheless, much progress has been made with computational methods on this problem, which serve as a paradigm for how one can approach and attack extreme data problems. In addition, viewing these physics problems from a computer-science perspective leads to new approaches that can be tried to solve them more accurately and for longer times. We review a number of these different ideas here. △ Less

Submitted 9 December, 2014; v1 submitted 22 October, 2014; originally announced October 2014.

Comments: 33 pages, 7 figures, invited review for Int. J. Mod. Phys. B; published version with additional references

Journal ref: Int J. Mod. Phys. B 28, 1430021 (2014)

arXiv:1209.4679 [pdf, ps, other]

doi 10.1109/JSAC.2013.130807

Coding and System Design for Quantize-Map-and-Forward Relaying

Authors: Vinayak Nagpal, I-Hsiang Wang, Milos Jorgovanovic, David Tse, Borivoje Nikolic

Abstract: In this paper we develop a low-complexity coding scheme and system design framework for the half duplex relay channel based on the Quantize-Map-and-Forward (QMF) relay- ing scheme. The proposed framework allows linear complexity operations at all network terminals. We propose the use of binary LDPC codes for encoding at the source and LDGM codes for map** at the relay. We express joint decoding… ▽ More In this paper we develop a low-complexity coding scheme and system design framework for the half duplex relay channel based on the Quantize-Map-and-Forward (QMF) relay- ing scheme. The proposed framework allows linear complexity operations at all network terminals. We propose the use of binary LDPC codes for encoding at the source and LDGM codes for map** at the relay. We express joint decoding at the destination as a belief propagation algorithm over a factor graph. This graph has the LDPC and LDGM codes as subgraphs connected via probabilistic constraints that model the QMF relay operations. We show that this coding framework extends naturally to the high SNR regime using bit interleaved coded modulation (BICM). We develop density evolution analysis tools for this factor graph and demonstrate the design of practical codes for the half-duplex relay channel that perform within 1dB of information theoretic QMF threshold. △ Less

Submitted 20 September, 2012; originally announced September 2012.

Comments: To appear in IEEE Journal of Selected Areas in Communication, Theories and Methods for Advanced Wireless Relays Part 2, 2012

Journal ref: IEEE Journal of Selected Areas in Communication, 2013

arXiv:0901.2164 [pdf, ps, other]

doi 10.1109/ISIT.2009.5205885

Cooperative Multiplexing in the Multiple Antenna Half Duplex Relay Channel

Authors: Vinayak Nagpal, Sameer Pawar, David Tse, Borivoje Nikolic

Abstract: Cooperation between terminals has been proposed to improve the reliability and throughput of wireless communication. While recent work has shown that relay cooperation provides increased diversity, increased multiplexing gain over that offered by direct link has largely been unexplored. In this work we show that cooperative multiplexing gain can be achieved by using a half duplex relay. We captu… ▽ More Cooperation between terminals has been proposed to improve the reliability and throughput of wireless communication. While recent work has shown that relay cooperation provides increased diversity, increased multiplexing gain over that offered by direct link has largely been unexplored. In this work we show that cooperative multiplexing gain can be achieved by using a half duplex relay. We capture relative distances between terminals in the high SNR diversity multiplexing tradeoff (DMT) framework. The DMT performance is then characterized for a network having a single antenna half-duplex relay between a single-antenna source and two-antenna destination. Our results show that the achievable multiplexing gain using cooperation can be greater than that of the direct link and is a function of the relative distance between source and relay compared to the destination. Moreover, for multiplexing gains less than 1, a simple scheme of the relay listening 1/3 of the time and transmitting 2/3 of the time can achieve the 2 by 2 MIMO DMT. △ Less

Submitted 14 January, 2009; originally announced January 2009.

Comments: 5 pages, 5 figures submitted to ISIT 2009

Journal ref: Information Theory, 2009. ISIT 2009. IEEE International Symposium on, vol., no., pp.1438-1442, June 28 2009-July 3 2009

Showing 1–12 of 12 results for author: Nikolic, B