-
Implications of Noise in Resistive Memory on Deep Neural Networks for Image Classification
Authors:
Yannick Emonds,
Kai Xi,
Holger Fröning
Abstract:
Resistive memory is a promising alternative to SRAM, but is also an inherently unstable device that requires substantial effort to ensure correct read and write operations. To avoid the associated costs in terms of area, time and energy, the present work is concerned with exploring how much noise in memory operations can be tolerated by image classification tasks based on neural networks. We intro…
▽ More
Resistive memory is a promising alternative to SRAM, but is also an inherently unstable device that requires substantial effort to ensure correct read and write operations. To avoid the associated costs in terms of area, time and energy, the present work is concerned with exploring how much noise in memory operations can be tolerated by image classification tasks based on neural networks. We introduce a special noisy operator that mimics the noise in an exemplary resistive memory unit, explore the resilience of convolutional neural networks on the CIFAR-10 classification task, and discuss a couple of countermeasures to improve this resilience.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Intelligent Anomaly Detection and Mitigation in Data Centers
Authors:
Ashkan Aghdai,
Kang Xi,
H. Jonathan Chao
Abstract:
Data centers play a key role in today's Internet. Cloud applications are mainly hosted on multi-tenant warehouse-scale data centers. Anomalies pose a serious threat to data centers' operations. If not controlled properly, a simple anomaly can spread throughout the data center, resulting in a cascading failure. Amazon AWS had been affected by such incidents recently. Although some solutions are pro…
▽ More
Data centers play a key role in today's Internet. Cloud applications are mainly hosted on multi-tenant warehouse-scale data centers. Anomalies pose a serious threat to data centers' operations. If not controlled properly, a simple anomaly can spread throughout the data center, resulting in a cascading failure. Amazon AWS had been affected by such incidents recently. Although some solutions are proposed to detect anomalies and prevent cascading failures, they mainly rely on application-specific metrics and case-based diagnosis to detect the anomalies. Given the variety of applications on a multi-tenant data center, proposed solutions are not capable of detecting anomalies in a timely manner. In this paper we design an application-agnostic anomaly detection scheme. More specifically, our design uses a highly distributed data mining scheme over network-level traffic metrics to detect anomalies. Once anomalies are detected, simple actions are taken to mitigate the damage. This ensures that errors are confined and prevents cascading failures before administrators intervene.
△ Less
Submitted 14 June, 2019;
originally announced June 2019.
-
Trimming the Multipath for Efficient Dynamic Routing
Authors:
Adrian Sai-wah Tam,
Kang Xi,
H. Jonathan Chao
Abstract:
Multipath routing is a trivial way to exploit the path diversity to leverage the network throughput. Technologies such as OSPF ECMP use all the available paths in the network to forward traffic, however, we argue that is not necessary to do so to load balance the network. In this paper, we consider multipath routing with only a limited number of end-to-end paths for each source and destination, an…
▽ More
Multipath routing is a trivial way to exploit the path diversity to leverage the network throughput. Technologies such as OSPF ECMP use all the available paths in the network to forward traffic, however, we argue that is not necessary to do so to load balance the network. In this paper, we consider multipath routing with only a limited number of end-to-end paths for each source and destination, and found that this can still load balance the traffic. We devised an algorithm to select a few paths for each source-destination pair so that when all traffic are forwarded over these paths, we can achieve a balanced load in the sense that the maximum link utilization is comparable to that of ECMP forwarding. When the constraint of only shortest paths (i.e. equal paths) are relaxed, we can even outperform ECMP in certain cases. As a result, we can use a few end-to-end tunnels between each source and destination nodes to achieve the load balancing of traffic.
△ Less
Submitted 4 September, 2011;
originally announced September 2011.
-
Use of Devolved Controllers in Data Center Networks
Authors:
Adrian S. -W. Tam,
Kang Xi,
H. Jonathan Chao
Abstract:
In a data center network, for example, it is quite often to use controllers to manage resources in a centralized man- ner. Centralized control, however, imposes a scalability problem. In this paper, we investigate the use of multiple independent controllers instead of a single omniscient controller to manage resources. Each controller looks after a portion of the network only, but they together co…
▽ More
In a data center network, for example, it is quite often to use controllers to manage resources in a centralized man- ner. Centralized control, however, imposes a scalability problem. In this paper, we investigate the use of multiple independent controllers instead of a single omniscient controller to manage resources. Each controller looks after a portion of the network only, but they together cover the whole network. This therefore solves the scalability problem. We use flow allocation as an example to see how this approach can manage the bandwidth use in a distributed manner. The focus is on how to assign components of a network to the controllers so that (1) each controller only need to look after a small part of the network but (2) there is at least one controller that can answer any request. We outline a way to configure the controllers to fulfill these requirements as a proof that the use of devolved controllers is possible. We also discuss several issues related to such implementation.
△ Less
Submitted 29 March, 2011;
originally announced March 2011.