-
Revising the classic computing paradigm and its technological implementations
Authors:
János Végh
Abstract:
Today's computing is told to be based on the classic paradigm, proposed by von Neumann, a three-quarter century ago. However, that paradigm was justified (for the timing relations of) vacuum tubes only. The technological development invalidated the classic paradigm (but not the model!) and led to catastrophic performance losses in computing systems, from operating gate level to large networks, inc…
▽ More
Today's computing is told to be based on the classic paradigm, proposed by von Neumann, a three-quarter century ago. However, that paradigm was justified (for the timing relations of) vacuum tubes only. The technological development invalidated the classic paradigm (but not the model!) and led to catastrophic performance losses in computing systems, from operating gate level to large networks, including the neuromorphic ones. The paper reviews the critical points of the classic paradigm and scrutinizes the confusion made around it. It discusses some of the consequences of improper technological implementation, from the shared media to the parallelized operation. The model is perfect, but it is applied outside of its range of validity. The paradigm is extended by providing the "procedure" that enables computing science to work with cases where the transfer time is not negligible apart from processing time.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
von Neumann's missing "Second Draft": what it should contain
Authors:
János Végh
Abstract:
Computing science is based on a computing paradigm that is not valid anymore for today's technological conditions. The reason is that the transmission time even inside the processor chip, but especially between the components of the system, is not negligible anymore. The paper introduces a quantitative measure for dispersion, which is vital for both computing performance and energy consumption, an…
▽ More
Computing science is based on a computing paradigm that is not valid anymore for today's technological conditions. The reason is that the transmission time even inside the processor chip, but especially between the components of the system, is not negligible anymore. The paper introduces a quantitative measure for dispersion, which is vital for both computing performance and energy consumption, and demonstrates how its value increased with the changing technology. The temporal behavior (including the dispersion of the commonly used synchronization clock time) of computing components has a critical impact on the system's performance at all levels, as demonstrated from gate-level operation to supercomputing. The same effect limits the utility of the researched new materials/effects if the related transfer time cannot be proportionally mitigated. von Neumann's model is perfect, but now it is used outside of its range of validity. The correct procedure to consider the transfer time for the present technological background is also derived.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
On the spatiotemporal behavior in biology-mimicking computing systems
Authors:
János Végh,
Ádám J. Berki
Abstract:
The payload performance of conventional computing systems, from single processors to supercomputers, reached its limits the nature enables. Both the growing demand to cope with "big data" (based on, or assisted by, artificial intelligence) and the interest in understanding the operation of our brain more completely, stimulated the efforts to build biology-mimicking computing systems from inexpensi…
▽ More
The payload performance of conventional computing systems, from single processors to supercomputers, reached its limits the nature enables. Both the growing demand to cope with "big data" (based on, or assisted by, artificial intelligence) and the interest in understanding the operation of our brain more completely, stimulated the efforts to build biology-mimicking computing systems from inexpensive conventional components and build different ("neuromorphic") computing systems. On one side, those systems require an unusually large number of processors, which introduces performance limitations and nonlinear scaling. On the other side, the neuronal operation drastically differs from the conventional workloads. The conventional computing (including both its mathematical background and physical implementation) is based on assuming instant interaction, while the biological neuronal systems have a "spatiotemporal" behavior. This difference alone makes imitating biological behavior in technical implementation hard. Besides, the recent issues in computing called the attention to that the temporal behavior is a general feature of computing systems, too. Some of their effects in both biological and technical systems were already noticed. Nevertheless, handling of those issues is incomplete/improper. Introducing temporal logic, based on the Minkowski transform, gives quantitative insight into the operation of both kinds of computing systems, furthermore provides a natural explanation of decades-old empirical phenomena. Without considering their temporal behavior correctly, neither effective implementation nor a true imitation of biological neural systems are possible.
△ Less
Submitted 23 September, 2020; v1 submitted 18 September, 2020;
originally announced September 2020.
-
Introducing temporal behavior to computing science
Authors:
János Végh
Abstract:
The abstraction introduced by von Neumann correctly reflected the state of the art 70 years ago.
Although it omitted data transmission time between components of the computer, it served as an excellent base for classic computing for decades.
Modern computer components and architectures, however, require to consider their temporal behavior: data transmission time in contemporary systems may be…
▽ More
The abstraction introduced by von Neumann correctly reflected the state of the art 70 years ago.
Although it omitted data transmission time between components of the computer, it served as an excellent base for classic computing for decades.
Modern computer components and architectures, however, require to consider their temporal behavior: data transmission time in contemporary systems may be higher than their processing time.
Using the classic paradigm leaves some issues unexplained, from enormously high power consumption to days-long training of artificial neural networks to failures of some cutting-edge supercomputer projects.
The paper introduces the up to now missing timely behavior (a temporal logic) into computing, while keeps the solid computing science base.
The careful analysis discovers that with considering the timely behavior of components and architectural principles, the mystic issues have a trivial explanation.
Some classic design principles must be revised, and the temporal logic enables us to design a more powerful and efficient computing.
△ Less
Submitted 26 September, 2020; v1 submitted 31 May, 2020;
originally announced June 2020.
-
How to extend the Single-Processor Paradigm to the Explicitly Many-Processor Approach
Authors:
János Végh
Abstract:
The computing paradigm invented for processing a small amount of data on a single segregated processor cannot meet the challenges set by the present-day computing demands. The paper proposes a new computing paradigm (extending the old one to use several processors explicitly) and discusses some questions of its possible implementation. Some advantages of the implemented approach, illustrated with…
▽ More
The computing paradigm invented for processing a small amount of data on a single segregated processor cannot meet the challenges set by the present-day computing demands. The paper proposes a new computing paradigm (extending the old one to use several processors explicitly) and discusses some questions of its possible implementation. Some advantages of the implemented approach, illustrated with the results of a loosely-timed simulator, are presented.
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
Which scaling rule applies to Artificial Neural Networks
Authors:
János Végh
Abstract:
The experience shows that cooperating and communicating computing systems, comprising segregated single processors, have severe performance limitations. In his classic "First Draft" von Neumann warned that using a "too fast processor" vitiates his simple "procedure" (but not his computing model!); furthermore, that using the classic computing paradigm for imitating neuronal operations, is unsound.…
▽ More
The experience shows that cooperating and communicating computing systems, comprising segregated single processors, have severe performance limitations. In his classic "First Draft" von Neumann warned that using a "too fast processor" vitiates his simple "procedure" (but not his computing model!); furthermore, that using the classic computing paradigm for imitating neuronal operations, is unsound. Amdahl added that large machines, comprising many processors, have an inherent disadvantage. Given that ANN's components are heavily communicating with each other, they are built from a large number of components designed/fabricated for use in conventional computing, furthermore they attempt to mimic biological operation using improper technological solutions, their achievable payload computing performance is conceptually modest. The type of workload that AI-based systems generate leads to an exceptionally low payload computational performance, and their design/technology limits their size to just above the "toy" level systems: the scaling of processor-based ANN systems is strongly nonlinear. Given the proliferation and growing size of ANN systems, we suggest ideas to estimate in advance the efficiency of the device or application. Through analyzing published measurements we provide evidence that the role of data transfer time drastically influences both ANNs performance and feasibility. It is discussed how some major theoretical limiting factors, ANN's layer structure and their methods of technical implementation of communication affect their efficiency. The paper starts from von Neumann's original model, without neglecting the transfer time apart from processing time; derives an appropriate interpretation and handling for Amdahl's law. It shows that, in that interpretation, Amdahl's Law correctly describes ANNs.
△ Less
Submitted 30 November, 2021; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Do we know the operating principles of our computers better than those of our brain?
Authors:
János Végh,
Ádám J. Berki
Abstract:
The increasing interest in understanding the behavior of the biological neural networks, and the increasing utilization of artificial neural networks in different fields and scales, both require a thorough understanding of how neuromorphic computing works. On the one side, the need to program those artificial neuron-like elements, and, on the other side, the necessity for a large number of such el…
▽ More
The increasing interest in understanding the behavior of the biological neural networks, and the increasing utilization of artificial neural networks in different fields and scales, both require a thorough understanding of how neuromorphic computing works. On the one side, the need to program those artificial neuron-like elements, and, on the other side, the necessity for a large number of such elements to cooperate, communicate and compute during tasks, need to be scrutinized to determine how efficiently conventional computing can assist in implementing such systems. Some electronic components bear a surprising resemblance to some biological structures. However, combining them with components that work using different principles can result in systems with very poor efficacy. The paper discusses how the conventional principles, components and thinking about computing limit mimicking the biological systems. We describe what changes will be necessary in the computing paradigms to get closer to the marvelously efficient operation of biological neural networks.
△ Less
Submitted 6 May, 2020;
originally announced May 2020.
-
How deep the machine learning can be
Authors:
János Végh
Abstract:
Today we live in the age of artificial intelligence and machine learning; from small startups to HW or SW giants, everyone wants to build machine intelligence chips, applications. The task, however, is hard: not only because of the size of the problem: the technology one can utilize (and the paradigm it is based upon) strongly degrades the chances to succeed efficiently. Today the single-processor…
▽ More
Today we live in the age of artificial intelligence and machine learning; from small startups to HW or SW giants, everyone wants to build machine intelligence chips, applications. The task, however, is hard: not only because of the size of the problem: the technology one can utilize (and the paradigm it is based upon) strongly degrades the chances to succeed efficiently. Today the single-processor performance practically reached the limits the laws of nature enable. The only feasible way to achieve the needed high computing performance seems to be parallelizing many sequentially working units. The laws of the (massively) parallelized computing, however, are different from those experienced in connection with assembling and utilizing systems comprising just-a-few single processors. As machine learning is mostly based on the conventional computing (processors), we scrutinize the (known, but somewhat faded) laws of the parallel computing, concerning AI. This paper attempts to review some of the caveats, especially concerning scaling the computing performance of the AI solutions.
△ Less
Submitted 2 May, 2020;
originally announced May 2020.
-
Re-evaluating scaling methods for distributed parallel systems
Authors:
János Végh
Abstract:
The paper explains why Amdahl's Law shall be interpreted specifically for distributed parallel systems and why it generated so many debates, discussions, and abuses. We set up a general model and list many of the terms affecting parallel processing. We scrutinize the validity of neglecting certain terms in different approximations, with special emphasis on the famous scaling laws of parallel proce…
▽ More
The paper explains why Amdahl's Law shall be interpreted specifically for distributed parallel systems and why it generated so many debates, discussions, and abuses. We set up a general model and list many of the terms affecting parallel processing. We scrutinize the validity of neglecting certain terms in different approximations, with special emphasis on the famous scaling laws of parallel processing. We clarify that when using the right interpretation of terms, Amdahl's Law is the governing law of all kinds of parallel processing. Amdahl's Law describes among others the history of supercomputing, the inherent performance limitation of the different kinds of parallel processing and it is the basic Law of the 'modern computing' paradigm, that the computing systems working under extreme computing conditions are desperately needed.
△ Less
Submitted 17 April, 2020; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Finally, how many efficiencies supercomputers have? And, what do they measure?
Authors:
János Végh
Abstract:
Using an extremely large number of processing elements in computing systems leads to unexpected phenomena, such as different efficiencies of the same system for different tasks, that cannot be explained in the frame of classical computing paradigm. The simple non-technical (but considering the temporal behavior of the components) model, introduced here, enables us to set up a frame and formalism,…
▽ More
Using an extremely large number of processing elements in computing systems leads to unexpected phenomena, such as different efficiencies of the same system for different tasks, that cannot be explained in the frame of classical computing paradigm. The simple non-technical (but considering the temporal behavior of the components) model, introduced here, enables us to set up a frame and formalism, needed to explain those unexpected experiences around supercomputing. Introducing temporal behavior into computer science also explains why only the extreme scale computing enabled us to reveal the experienced limitations. The paper shows, that degradation of efficiency of parallelized sequential systems is a natural consequence of the classical computing paradigm, instead of being an engineering imperfectness. The workload, that supercomputers run, is much responsible for wasting energy, as well as limiting the size and type of tasks. Case studies provide insight, how different contributions compete for dominating the resulting payload performance of a computing system, and how enhancing the interconnection technology made computing+communication to dominate in defining the efficiency of supercomputers. Our model also enables to derive predictions about supercomputer performance limitations for the near future, as well as it provides hints for enhancing supercomputer components. Phenomena experienced in large-scale computing show interesting parallels with phenomena experienced in science, more than a century ago, and through their studying a modern science was developed.
△ Less
Submitted 11 July, 2022; v1 submitted 5 January, 2020;
originally announced January 2020.
-
The need for modern computing paradigm: Science applied to computing
Authors:
János Végh
Abstract:
More than hundred years ago the 'classic physics' was it in its full power, with just a few unexplained phenomena; which however led to a revolution and the development of the 'modern physics'. Today the computing is in a similar position: computing is a sound success story, with exponentially growing utilization, but with a growing number of difficulties and unexpected issues as moving towards ex…
▽ More
More than hundred years ago the 'classic physics' was it in its full power, with just a few unexplained phenomena; which however led to a revolution and the development of the 'modern physics'. Today the computing is in a similar position: computing is a sound success story, with exponentially growing utilization, but with a growing number of difficulties and unexpected issues as moving towards extreme utilization conditions. In physics studying the nature under extreme conditions has lead to the understanding of the relativistic and quantal behavior. Quite similarly in computing some phenomena, acquired in connection with extreme (computing) conditions, cannot be understood based on of the 'classic computing paradigm'. The paper draws the attention that under extreme conditions qualitatively different behaviors may be encountered in both physics and computing, and pinpointing that certain, formerly unnoticed or neglected aspects enable to explain new phenomena as well as to enhance computing features. Moreover, an idea of modern computing paradigm implementation is proposed.
△ Less
Submitted 5 January, 2020; v1 submitted 2 August, 2019;
originally announced August 2019.
-
The performance wall of parallelized sequential computing: the dark performance and the roofline of performance gain
Authors:
János Végh
Abstract:
The computing performance today is develo** mainly using parallelized sequential computing, in many forms. The paper scrutinizes whether the performance of that type of computing has an upper limit. The simple considerations point out that the theoretically possible upper bound is practically achieved, and that the main obstacle to step further is the presently used computing paradigm and implem…
▽ More
The computing performance today is develo** mainly using parallelized sequential computing, in many forms. The paper scrutinizes whether the performance of that type of computing has an upper limit. The simple considerations point out that the theoretically possible upper bound is practically achieved, and that the main obstacle to step further is the presently used computing paradigm and implementation technology. In addition to the former "walls", also the "performance wall" must be considered. As the paper points out, similarly to the "dark silicon", also the "dark performance" is always present in the parallelized many-processor systems.
△ Less
Submitted 2 August, 2019;
originally announced August 2019.
-
Limitations of performance of Exascale Applications and supercomputers they are running on
Authors:
János Végh
Abstract:
The paper highlights that the cooperation of the components of the computing systems receives even more focus in the coming age of exascale computing. It discovers that inherent performance limitations exist and identifies the major critical contributions of the performance on many-many processor systems. The extended and reinterpreted simple Amdahl model describes the behavior of the existing sup…
▽ More
The paper highlights that the cooperation of the components of the computing systems receives even more focus in the coming age of exascale computing. It discovers that inherent performance limitations exist and identifies the major critical contributions of the performance on many-many processor systems. The extended and reinterpreted simple Amdahl model describes the behavior of the existing supercomputers surprisingly well, and explains some mystical happenings around high-performance computing. It is pointed out that using the present technology and paradigm only marginal development of performance is possible, and that the major obstacle towards higher performance applications is the 70-years old computing paradigm itself. A way to step forward is also suggested
△ Less
Submitted 15 August, 2018;
originally announced August 2018.
-
Renewing computing paradigms for more efficient parallelization of single-threads
Authors:
János Végh
Abstract:
Computing is still based on the 70-years old paradigms introduced by von Neumann. The need for more performant, comfortable and safe computing forced to develop and utilize several tricks both in hardware and software. Till now technology enabled to increase performance without changing the basic computing paradigms. The recent stalling of single-threaded computing performance, however, requires t…
▽ More
Computing is still based on the 70-years old paradigms introduced by von Neumann. The need for more performant, comfortable and safe computing forced to develop and utilize several tricks both in hardware and software. Till now technology enabled to increase performance without changing the basic computing paradigms. The recent stalling of single-threaded computing performance, however, requires to redesign computing to be able to provide the expected performance. To do so, the computing paradigms themselves must be scrutinized. The limitations caused by the too restrictive interpretation of the computing paradigms are demonstrated, an extended computing paradigm introduced, ideas about changing elements of the computing stack suggested, some implementation details of both hardware and software discussed. The resulting new computing stack offers considerably higher computing throughput, simplified hardware architecture, drastically improved real-time behavior and in general, simplified and more efficient computing stack.
△ Less
Submitted 22 February, 2018;
originally announced March 2018.
-
Statistical considerations on limitations of supercomputers
Authors:
János Végh
Abstract:
Supercomputer building is a many sceene, many authors game, comprising a lot of different technologies, manufacturers and ideas. Checking data available in the public database in a systematic way, some general tendencies and limitations can be concluded, both for the past and the future. The feasibility of building exa-scale computers as well as their limitations and utilization are also discussed…
▽ More
Supercomputer building is a many sceene, many authors game, comprising a lot of different technologies, manufacturers and ideas. Checking data available in the public database in a systematic way, some general tendencies and limitations can be concluded, both for the past and the future. The feasibility of building exa-scale computers as well as their limitations and utilization are also discussed. The statistical considerations provide a strong support for the conclusions.
△ Less
Submitted 28 March, 2018; v1 submitted 24 October, 2017;
originally announced October 2017.
-
How Amdahl's low restricts supercomputer applications and building ever bigger supercomputers
Authors:
János Végh
Abstract:
This paper reinterprets Amdahl's law in terms of execution time and applies this simple model to supercomputing. The systematic discussion results in practical formulas enabling to calculate expected running time using large number of processors from experimental runs using low number of processors, delivers a quantitative measure of computational efficiency of supercomputing applications. Through…
▽ More
This paper reinterprets Amdahl's law in terms of execution time and applies this simple model to supercomputing. The systematic discussion results in practical formulas enabling to calculate expected running time using large number of processors from experimental runs using low number of processors, delivers a quantitative measure of computational efficiency of supercomputing applications. Through separating non-parallelizable contribution to fractions according to their origin, Amdahl's law enables to derive a timeline for supercomputers (quite similar to Moore's law) and describes why Amdahl's law limits the size of supercomputers. The paper validates that Amdahl's 50-years old model (with slight extension) correctly describes the performance limitations of the present supercomputers. Using some simple and reasonable assumptions, the absolute performance bound of supercomputers is concluded, furthermore that serious enhancements are still necessary to achieve the exaFLOPS dream value.
△ Less
Submitted 29 December, 2017; v1 submitted 4 August, 2017;
originally announced August 2017.
-
Can Broken Multicore Hardware be Mended?
Authors:
János Végh
Abstract:
A suggestion is made for mending multicore hardware, which has been diagnosed as broken.
A suggestion is made for mending multicore hardware, which has been diagnosed as broken.
△ Less
Submitted 12 November, 2016;
originally announced November 2016.
-
A new kind of parallelism and its programming in the Explicitly Many-Processor Approach
Authors:
János Végh
Abstract:
The processor accelerators are effective because they are working not (completely) on principles of stored program computers. They use some kind of parallelism, and it is rather hard to program them effectively: a parallel architecture by means of (and thinking in) sequential programming. The recently introduced EMPA architecture uses a new kind of parallelism, which offers the potential of reachi…
▽ More
The processor accelerators are effective because they are working not (completely) on principles of stored program computers. They use some kind of parallelism, and it is rather hard to program them effectively: a parallel architecture by means of (and thinking in) sequential programming. The recently introduced EMPA architecture uses a new kind of parallelism, which offers the potential of reaching higher degree of parallelism, and also provides extra possibilities and challenges. It not only provides synchronization and inherent parallelization, but also takes over some duties typically offered by the OS, and even opens the till now closed machine instructions for the end-user. A toolchain for EMPA architecture with Y86 cores has been prepared, including an assembler and a cycle-accurate simulator. The assembler is equipped with some meta-instructions, which allow to use all advanced possibilities of the EMPA architecture, and at the same time provide a (nearly) conventional-style programming. The cycle accurate simulator is able to execute the EMPA-aware object code, and is a good tool for develo** algorithms for EMPA
△ Less
Submitted 24 August, 2016;
originally announced August 2016.
-
Comments on the parallelization efficiency of the Sunway TaihuLight supercomputer
Authors:
János Végh
Abstract:
In the world of supercomputers, the large number of processors requires to minimize the inefficiencies of parallelization, which appear as a sequential part of the program from the point of view of Amdahl's law. The recently suggested new figure of merit is applied to the recently presented supercomputer, and the timeline of "Top 500" supercomputers is scrutinized using the metric. It is demonstra…
▽ More
In the world of supercomputers, the large number of processors requires to minimize the inefficiencies of parallelization, which appear as a sequential part of the program from the point of view of Amdahl's law. The recently suggested new figure of merit is applied to the recently presented supercomputer, and the timeline of "Top 500" supercomputers is scrutinized using the metric. It is demonstrated, that in addition to the computing performance and power consumption, the new supercomputer is also excellent in the efficiency of parallelization. Based on the suggested merit, a "Moore-law" like observation is derived for the timeline of parallelization efficacy of supercomputers.
△ Less
Submitted 31 July, 2016;
originally announced August 2016.
-
A configurable accelerator for manycores: the Explicitly Many-Processor Approach
Authors:
János Végh
Abstract:
A new approach to designing processor accelerators is presented. A new computing model and a special kind of accelerator with dynamic (end-user programmable) architecture is suggested. The new model considers a processor, in which a newly introduced supervisor layer coordinates the job of the cores. The cores have the ability (based on the parallelization information provided by the compiler, and…
▽ More
A new approach to designing processor accelerators is presented. A new computing model and a special kind of accelerator with dynamic (end-user programmable) architecture is suggested. The new model considers a processor, in which a newly introduced supervisor layer coordinates the job of the cores. The cores have the ability (based on the parallelization information provided by the compiler, and using the help of the supervisor) to outsource part of the job they received to some neighbouring core. The introduced changes essentially and advantageously modify the architecture and operation of the computing systems. The computing throughput drastically increases, the efficiency of the technological implementation (computing performance per logic gates) increases, the non-payload activity for using operating system services decreases, the real-time behavior changes advantageously, and connecting accelerators to the processor greatly simplifies. Here only some details of the architecture and operation of the processor are discussed, the rest is described elsewhere.
△ Less
Submitted 6 July, 2016;
originally announced July 2016.
-
A figure of merit for describing the performance of scaling of parallelization
Authors:
János Végh,
Péter Molnár,
József Vásárhelyi
Abstract:
With the spread of multi- and many-core processors more and more typical task is to re-implement some source code written originally for a single processor to run on more than one cores. Since it is a serious investment, it is important to decide how much efforts pays off, and whether the resulting implementation has as good performability as it could be. The Amdahl's law provides some theoretical…
▽ More
With the spread of multi- and many-core processors more and more typical task is to re-implement some source code written originally for a single processor to run on more than one cores. Since it is a serious investment, it is important to decide how much efforts pays off, and whether the resulting implementation has as good performability as it could be. The Amdahl's law provides some theoretical upper limits for the performance gain reachable through parallelizing the code, but it needs the detailed architectural knowledge of the program code, does not consider the housekee** activity needed for parallelization and cannot tell how the actual stage of parallelization implementation performs. The present paper suggests a quantitative measure for that goal. This figure of merit is derived experimentally, from measured running time, and number of threads/cores. It can be used to quantify the used parallelization technology, the connection between the computing units, the acceleration technology under the given conditions, communication method within SoC, or the performance of the software team/compiler.
△ Less
Submitted 22 July, 2016; v1 submitted 8 June, 2016;
originally announced June 2016.