Search | arXiv e-print repository

doi 10.1145/3579371.3589071

Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

Authors: Rakesh Nadig, Mohammad Sadrosadati, Haiyu Mao, Nika Mansouri Ghiasi, Arash Tavakkol, Jisung Park, Hamid Sarbazi-Azad, Juan Gómez Luna, Onur Mutlu

Abstract: The performance and capacity of solid-state drives (SSDs) are continuously improving to meet the increasing demands of modern data-intensive applications. Unfortunately, communication between the SSD controller and memory chips (e.g., 2D/3D NAND flash chips) is a critical performance bottleneck for many applications. SSDs use a multi-channel shared bus architecture where multiple memory chips conn… ▽ More The performance and capacity of solid-state drives (SSDs) are continuously improving to meet the increasing demands of modern data-intensive applications. Unfortunately, communication between the SSD controller and memory chips (e.g., 2D/3D NAND flash chips) is a critical performance bottleneck for many applications. SSDs use a multi-channel shared bus architecture where multiple memory chips connected to the same channel communicate to the SSD controller with only one path. As a result, path conflicts often occur during the servicing of multiple I/O requests, which significantly limits SSD parallelism. It is critical to handle path conflicts well to improve SSD parallelism and performance. Our goal is to fundamentally tackle the path conflict problem by increasing the number of paths between the SSD controller and memory chips at low cost. To this end, we build on the idea of using an interconnection network to increase the path diversity between the SSD controller and memory chips. We propose Venice, a new mechanism that introduces a low-cost interconnection network between the SSD controller and memory chips and utilizes the path diversity to intelligently resolve path conflicts. Venice employs three key techniques: 1) a simple router chip added next to each memory chip without modifying the memory chip design, 2) a path reservation technique that reserves a path from the SSD controller to the target memory chip before initiating a transfer, and 3) a fully-adaptive routing algorithm that effectively utilizes the path diversity to resolve path conflicts. Our experimental results show that Venice 1) improves performance by an average of 2.65x/1.67x over a baseline performance-optimized/cost-optimized SSD design across a wide range of workloads, 2) reduces energy consumption by an average of 61% compared to a baseline performance-optimized SSD design. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: To appear in Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA), 2023

arXiv:2209.10914 [pdf, other]

Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core Resources

Authors: Sina Darabi, Mohammad Sadrosadati, Joël Lindegger, Negar Akbarzadeh, Mohammad Hosseini, Jisung Park, Juan Gómez-Luna, Hamid Sarbazi-Azad, Onur Mutlu

Abstract: Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel applications. In many GPU applications, GPU memory bandwidth bottlenecks performance, causing underutilization of GPU cores. Hence, disabling many cores does not affect the performance of memory-bound workloads. While simply power-gating unused GPU cores would save energy, prior works attempt to better utilize GPU core… ▽ More Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel applications. In many GPU applications, GPU memory bandwidth bottlenecks performance, causing underutilization of GPU cores. Hence, disabling many cores does not affect the performance of memory-bound workloads. While simply power-gating unused GPU cores would save energy, prior works attempt to better utilize GPU cores for other applications (ideally compute-bound), which increases the GPU's total throughput. In this paper, we introduce Morpheus, a new hardware/software co-designed technique to boost the performance of memory-bound applications. The key idea of Morpheus is to exploit unused core resources to extend the GPU last level cache (LLC) capacity. In Morpheus, each GPU core has two execution modes: compute mode and cache mode. Cores in compute mode operate conventionally and run application threads. However, for the cores in cache mode, Morpheus invokes a software helper kernel that uses the cores' on-chip memories (i.e., register file, shared memory, and L1) in a way that extends the LLC capacity for a running memory-bound workload. Morpheus adds a controller to the GPU hardware to forward LLC requests to either the conventional LLC (managed by hardware) or the extended LLC (managed by the helper kernel). Our experimental results show that Morpheus improves the performance and energy efficiency of a baseline GPU architecture by an average of 39% and 58%, respectively, across several memory-bound workloads. Morpheus' performance is within 3% of a GPU design that has a quadruple-sized conventional LLC. Morpheus can thus contribute to reducing the hardware dedicated to a conventional LLC by exploiting idle cores' on-chip memory resources as additional cache capacity. △ Less

Submitted 6 April, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022

arXiv:2201.04353 [pdf, other]

A simple model for citation curve

Authors: Y. C. Tay, Mostafa Rezazad, Hamid Sarbazi-Azad

Abstract: There is considerable interest in the citation count for an author's publications. This has led to many proposals for citation indices for characterizing citation distributions. However, there is so far no tractable model to facilitate the analysis of these distributions and the design of these indices. This paper presents a simple equation for such design and analysis. The equation has three para… ▽ More There is considerable interest in the citation count for an author's publications. This has led to many proposals for citation indices for characterizing citation distributions. However, there is so far no tractable model to facilitate the analysis of these distributions and the design of these indices. This paper presents a simple equation for such design and analysis. The equation has three parameters that are calibrated by three geometrical characteristics of a citation distribution. Its simple form makes it tractable. To demonstrate, the equation is used to derive closed-form expressions for various citation indices, analyze the effect of time and identify individual contribution to the Hirsch index for a group. △ Less

Submitted 12 January, 2022; originally announced January 2022.

Comments: 13 pages, 19 figures, 2 tables

arXiv:2102.01764 [pdf, other]

MANA: Microarchitecting an Instruction Prefetcher

Authors: Ali Ansari, Fatemeh Golshan, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad

Abstract: L1 instruction (L1-I) cache misses are a source of performance bottleneck. Sequential prefetchers are simple solutions to mitigate this problem; however, prior work has shown that these prefetchers leave considerable potentials uncovered. This observation has motivated many researchers to come up with more advanced instruction prefetchers. In 2011, Proactive Instruction Fetch (PIF) showed that a h… ▽ More L1 instruction (L1-I) cache misses are a source of performance bottleneck. Sequential prefetchers are simple solutions to mitigate this problem; however, prior work has shown that these prefetchers leave considerable potentials uncovered. This observation has motivated many researchers to come up with more advanced instruction prefetchers. In 2011, Proactive Instruction Fetch (PIF) showed that a hardware prefetcher could effectively eliminate all of the instruction-cache misses. However, its enormous storage cost makes it an impractical solution. Consequently, reducing the storage cost was the main research focus in the instruction prefetching in the past decade. Several instruction prefetchers, including RDIP and Shotgun, were proposed to offer PIF-level performance with significantly lower storage overhead. However, our findings show that there is a considerable performance gap between these proposals and PIF. While these proposals use different mechanisms for instruction prefetching, the performance gap is largely not because of the mechanism, and instead, is due to not having sufficient storage. Prior proposals suffer from one or both of the following shortcomings: (1) a large number of metadata records to cover the potential, and (2) a high storage cost of each record. The first problem causes metadata miss, and the second problem prohibits the prefetcher from storing enough records within reasonably-sized storage. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: 24 pages with 15 figures

arXiv:2101.00969 [pdf, other]

Understanding Power Consumption and Reliability of High-Bandwidth Memory with Voltage Underscaling

Authors: Seyed Saber Nabavi Larimi, Behzad Salami, Osman S. Unsal, Adrian Cristal Kestelman, Hamid Sarbazi-Azad, Onur Mutlu

Abstract: Modern computing devices employ High-Bandwidth Memory (HBM) to meet their memory bandwidth requirements. An HBM-enabled device consists of multiple DRAM layers stacked on top of one another next to a compute chip (e.g. CPU, GPU, and FPGA) in the same package. Although such HBM structures provide high bandwidth at a small form factor, the stacked memory layers consume a substantial portion of the p… ▽ More Modern computing devices employ High-Bandwidth Memory (HBM) to meet their memory bandwidth requirements. An HBM-enabled device consists of multiple DRAM layers stacked on top of one another next to a compute chip (e.g. CPU, GPU, and FPGA) in the same package. Although such HBM structures provide high bandwidth at a small form factor, the stacked memory layers consume a substantial portion of the package's power budget. Therefore, power-saving techniques that preserve the performance of HBM are desirable. Undervolting is one such technique: it reduces the supply voltage to decrease power consumption without reducing the device's operating frequency to avoid performance loss. Undervolting takes advantage of voltage guardbands put in place by manufacturers to ensure correct operation under all environmental conditions. However, reducing voltage without changing frequency can lead to reliability issues manifested as unwanted bit flips. In this paper, we provide the first experimental study of real HBM chips under reduced-voltage conditions. We show that the guardband regions for our HBM chips constitute 19% of the nominal voltage. Pushing the supply voltage down within the guardband region reduces power consumption by a factor of 1.5X for all bandwidth utilization rates. Pushing the voltage down further by 11% leads to a total of2.3X power savings at the cost of unwanted bit flips. We explore and characterize the rate and types of these reduced-voltage-induced bit flips and present a fault map that enables the possibility of a three-factor trade-off among power, memory capacity, and fault rate. △ Less

Submitted 30 December, 2020; originally announced January 2021.

Comments: To appear at DATE 2021 conference

arXiv:2010.09330 [pdf, other]

Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation

Authors: Mohammad Sadrosadati, Amirhossein Mirhosseini, Ali Hajiabadi, Seyed Borna Ehsani, Hajar Falahati, Hamid Sarbazi-Azad, Mario Drumond, Babak Falsafi, Rachata Ausavarungnirun, Onur Mutlu

Abstract: Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce the register file power consumption by caching reg… ▽ More Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce the register file power consumption by caching registers in a smaller register file cache. Unfortunately, this approach does not improve register access latency due to the low hit rate in the register file cache. In this paper, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical structure while kee** power consumption low. We observe that compile-time interval analysis enables us to divide GPU program execution into intervals with an accurate estimate of a warp's aggregate register working-set within each interval. The key idea of LTRF is to prefetch the estimated register working-set from the main register file to the register file cache under software control, at the beginning of each interval, and overlap the prefetch latency with the execution of other warps. We observe that register bank conflicts while prefetching the registers could greatly reduce the effectiveness of LTRF. Therefore, we devise a compile-time register renumbering technique to reduce the likelihood of register bank conflicts. Our experimental results show that LTRF enables high-capacity yet long-latency main GPU register files, paving the way for various optimizations. As an example optimization, we implement the main register file with emerging high-density high-latency memory technologies, enabling 8X larger capacity and improving overall GPU performance by 34%. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: To Appear in ACM Transactions on Computer Systems (TOCS)

arXiv:2009.00715 [pdf, other]

A Survey on Recent Hardware Data Prefetching Approaches with An Emphasis on Servers

Authors: Mohammad Bakhshalipour, Mehran Shakerinava, Fatemeh Golshan, Ali Ansari, Pejman Lotfi-Karman, Hamid Sarbazi-Azad

Abstract: Data prefetching, i.e., the act of predicting application's future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely-used approach to hide the long latency of memory accesses. The fruitfulness of data prefetching is evident to both industry and academy: nowadays, almost every high-performance processor incorporates a few data prefetchers for capturin… ▽ More Data prefetching, i.e., the act of predicting application's future memory accesses and fetching those that are not in the on-chip caches, is a well-known and widely-used approach to hide the long latency of memory accesses. The fruitfulness of data prefetching is evident to both industry and academy: nowadays, almost every high-performance processor incorporates a few data prefetchers for capturing various access patterns of applications; besides, there is a myriad of proposals for data prefetching in the research literature, where each proposal enhances the efficiency of prefetching in a specific way. In this survey, we discuss the fundamental concepts in data prefetching and study state-of-the-art hardware data prefetching approaches. Additional Key Words and Phrases: Data Prefetching, Scale-Out Workloads, Server Processors, and Spatio-Temporal Correlation. △ Less

Submitted 1 September, 2020; originally announced September 2020.

arXiv:2005.03451 [pdf, other]

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

Authors: Behzad Salami, Erhan Baturay Onural, Ismail Emir Yuksel, Fahrettin Koc, Oguz Ergin, Adrian Cristal Kestelman, Osman S. Unsal, Hamid Sarbazi-Azad, Onur Mutlu

Abstract: We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power tr… ▽ More We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%. △ Less

Submitted 30 December, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

Comments: To appear at the DSN 2020 conference

arXiv:1812.11473 [pdf, other]

ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning

Authors: Hajar Falahati, Pejman Lotfi-Kamran, Mohammad Sadrosadati, Hamid Sarbazi-Azad

Abstract: Memory bandwidth bottleneck is a major challenges in processing machine learning (ML) algorithms. In-memory acceleration has potential to address this problem; however, it needs to address two challenges. First, in-memory accelerator should be general enough to support a large set of different ML algorithms. Second, it should be efficient enough to utilize bandwidth while meeting limited power and… ▽ More Memory bandwidth bottleneck is a major challenges in processing machine learning (ML) algorithms. In-memory acceleration has potential to address this problem; however, it needs to address two challenges. First, in-memory accelerator should be general enough to support a large set of different ML algorithms. Second, it should be efficient enough to utilize bandwidth while meeting limited power and area budgets of logic layer of a 3D-stacked memory. We observe that previous work fails to simultaneously address both challenges. We propose ORIGAMI, a heterogeneous set of in-memory accelerators, to support compute demands of different ML algorithms, and also uses an off-the-shelf compute platform (e.g.,FPGA,GPU,TPU,etc.) to utilize bandwidth without violating strict area and power budgets. ORIGAMI offers a pattern-matching technique to identify similar computation patterns of ML algorithms and extracts a compute engine for each pattern. These compute engines constitute heterogeneous accelerators integrated on logic layer of a 3D-stacked memory. Combination of these compute engines can execute any type of ML algorithms. To utilize available bandwidth without violating area and power budgets of logic layer, ORIGAMI comes with a computation-splitting compiler that divides an ML algorithm between in-memory accelerators and an out-of-the-memory platform in a balanced way and with minimum inter-communications. Combination of pattern matching and split execution offers a new design point for acceleration of ML algorithms. Evaluation results across 12 popular ML algorithms show that ORIGAMI outperforms state-of-the-art accelerator with 3D-stacked memory in terms of performance and energy-delay product (EDP) by 1.5x and 29x (up to 1.6x and 31x), respectively. Furthermore, results are within a 1% margin of an ideal system that has unlimited compute resources on logic layer of a 3D-stacked memory. △ Less

Submitted 9 January, 2019; v1 submitted 30 December, 2018; originally announced December 2018.

Comments: 11 pages, 9 figures

MSC Class: 68M01

arXiv:1809.08828 [pdf, other]

Die-Stacked DRAM: Memory, Cache, or MemCache?

Authors: Mohammad Bakhshalipour, HamidReza Zare, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad

Abstract: Die-stacked DRAM is a promising solution for satisfying the ever-increasing memory bandwidth requirements of multi-core processors. Manufacturing technology has enabled stacking several gigabytes of DRAM modules on the active die, thereby providing orders of magnitude higher bandwidth as compared to the conventional DIMM-based DDR memories. Nevertheless, die-stacked DRAM, due to its limited capaci… ▽ More Die-stacked DRAM is a promising solution for satisfying the ever-increasing memory bandwidth requirements of multi-core processors. Manufacturing technology has enabled stacking several gigabytes of DRAM modules on the active die, thereby providing orders of magnitude higher bandwidth as compared to the conventional DIMM-based DDR memories. Nevertheless, die-stacked DRAM, due to its limited capacity, cannot accommodate entire datasets of modern big-data applications. Therefore, prior proposals use it either as a sizable memory-side cache or as a part of the software-visible main memory. Cache designs can adapt themselves to the dynamic variations of applications but suffer from the tag storage/latency/bandwidth overhead. On the other hand, memory designs eliminate the need for tags, and hence, provide efficient access to data, but are unable to capture the dynamic behaviors of applications due to their static nature. In this work, we make a case for using the die-stacked DRAM partly as main memory and partly as a cache. We observe that in modern big-data applications there are many hot pages with a large number of accesses. Based on this observation, we propose to use a portion of the die-stacked DRAM as main memory to host hot pages, enabling serving a significant number of the accesses from the high-bandwidth DRAM without the overhead of tag-checking, and manage the rest of the DRAM as a cache, for capturing the dynamic behavior of applications. In this proposal, a software procedure pre-processes the application and determines hot pages, then asks the OS to map them to the memory portion of the die-stacked DRAM. The cache portion of the die-stacked DRAM is managed by hardware, caching data allocated in the off-chip memory. △ Less

Submitted 24 September, 2018; originally announced September 2018.

arXiv:1808.05024 [pdf, other]

Making Belady-Inspired Replacement Policies More Effective Using Expected Hit Count

Authors: Seyed Armin Vakil Ghahani, Sara Mahdizadeh Shahri, Mohammad Bakhshalipour, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad

Abstract: Memory-intensive workloads operate on massive amounts of data that cannot be captured by last-level caches (LLCs) of modern processors. Consequently, processors encounter frequent off-chip misses, and hence, lose a significant performance potential. One way to reduce the number of off-chip misses is through using a well-behaved replacement policy in the LLC. Existing processors employ a variation… ▽ More Memory-intensive workloads operate on massive amounts of data that cannot be captured by last-level caches (LLCs) of modern processors. Consequently, processors encounter frequent off-chip misses, and hence, lose a significant performance potential. One way to reduce the number of off-chip misses is through using a well-behaved replacement policy in the LLC. Existing processors employ a variation of least recently used (LRU) policy to determine a victim for replacement. Unfortunately, there is a large gap between what LRU offers and that of Belady's MIN, which is the optimal replacement policy. Belady's MIN requires selecting a victim with the longest reuse distance, and hence, is unfeasible due to the need to know the future. Consequently, Belady-inspired replacement polices use Belady's MIN to derive an indicator to help them choose a victim for replacement. In this work, we show that the indicator that is used in the state-of-the-art Belady-inspired replacement policy is not decisive in picking a victim in a considerable number of cases, and hence, the policy has to rely on a standard metric (e.g., recency or frequency) to pick a victim, which is inefficient. We observe that there exist strong correlations among the hit counts of cache blocks in the same region of memory when Belady's MIN is the replacement policy. Taking advantage of this observation, we propose an expected-hit-count indicator for the memory regions and use it to improve the victim selection mechanism of Belady-inspired replacement policies when the main indicator is not decisive. Our proposal offers a 5.2\% performance improvement over the baseline LRU and outperforms Hawkeye, which is the state-of-the-art replacement policy. △ Less

Submitted 15 August, 2018; originally announced August 2018.

arXiv:1808.04864 [pdf, other]

Scale-Out Processors & Energy Efficiency

Authors: Pouya Esmaili-Dokht, Mohammad Bakhshalipour, Behnam Khodabandeloo, Pejman Lotfi-Kamran, Hamid Sarbazi-Azad

Abstract: Scale-out workloads like media streaming or Web search serve millions of users and operate on a massive amount of data, and hence, require enormous computational power. As the number of users is increasing and the size of data is expanding, even more computational power is necessary for powering up such workloads. Data centers with thousands of servers are providing the computational power necessa… ▽ More Scale-out workloads like media streaming or Web search serve millions of users and operate on a massive amount of data, and hence, require enormous computational power. As the number of users is increasing and the size of data is expanding, even more computational power is necessary for powering up such workloads. Data centers with thousands of servers are providing the computational power necessary for executing scale-out workloads. As operating data centers requires enormous capital outlay, it is important to optimize them to execute scale-out workloads efficiently. Server processors contribute significantly to the data center capital outlay, and hence, are a prime candidate for optimizations. While data centers are constrained with power, and power consumption is one of the major components contributing to the total cost of ownership (TCO), a recently-introduced scale-out design methodology optimizes server processors for data centers using performance per unit area. In this work, we use a more relevant performance-per-power metric as the optimization criterion for optimizing server processors and reevaluate the scale-out design methodology. Interestingly, we show that a scale-out processor that delivers the maximum performance per unit area, also delivers the highest performance per unit power. △ Less

Submitted 14 August, 2018; originally announced August 2018.

arXiv:1805.07269 [pdf, other]

Parallelizing Bisection Root-Finding: A Case for Accelerating Serial Algorithms in Multicore Substrates

Authors: Mohammad Bakhshalipour, Hamid Sarbazi-Azad

Abstract: Multicore architectures dominate today's processor market. Even though the number of cores and threads are pretty high and continues to grow, inherently serial algorithms do not benefit from the abundance of cores and threads. In this paper, we propose Runahead Computing, a technique which uses idle threads in a multi-threaded architecture for accelerating the execution time of serial algorithms.… ▽ More Multicore architectures dominate today's processor market. Even though the number of cores and threads are pretty high and continues to grow, inherently serial algorithms do not benefit from the abundance of cores and threads. In this paper, we propose Runahead Computing, a technique which uses idle threads in a multi-threaded architecture for accelerating the execution time of serial algorithms. Through detailed evaluations targeting both CPU and GPU platforms and a specific serial algorithm, our approach reduces the execution latency up to 9x in our experiments. △ Less

Submitted 10 May, 2018; originally announced May 2018.

Comments: 5 pages, 7 figures

arXiv:0710.1924 [pdf]

A Heuristic Routing Mechanism Using a New Addressing Scheme

Authors: Mohsen Ravanbakhsh, Yasin Abbasi-Yadkori, Maghsoud Abbaspour, Hamid Sarbazi-Azad

Abstract: Current methods of routing are based on network information in the form of routing tables, in which routing protocols determine how to update the tables according to the network changes. Despite the variability of data in routing tables, node addresses are constant. In this paper, we first introduce the new concept of variable addresses, which results in a novel framework to cope with routing pr… ▽ More Current methods of routing are based on network information in the form of routing tables, in which routing protocols determine how to update the tables according to the network changes. Despite the variability of data in routing tables, node addresses are constant. In this paper, we first introduce the new concept of variable addresses, which results in a novel framework to cope with routing problems using heuristic solutions. Then we propose a heuristic routing mechanism based on the application of genes for determination of network addresses in a variable address network and describe how this method flexibly solves different problems and induces new ideas in providing integral solutions for variety of problems. The case of ad-hoc networks is where simulation results are more supportive and original solutions have been proposed for issues like mobility. △ Less

Submitted 10 October, 2007; originally announced October 2007.

Comments: 8 pages, because of lack of space journal reference just contains the reference to the proceeding

Journal ref: Proceedings of First International Conference on Bio Inspired models of Networks, Information and Computing Systems (BIONETICS), Cavalese, Italy, December 2006

Showing 1–14 of 14 results for author: Sarbazi-Azad, H