DRAM Power and Thermal Optimizations in Emerging Multi-Core Technologies

Holistic Computer Architectures for Nanoscale Processors

Gokhan Memik

Department of Electrical Engineering and Computer Science

Northwestern University

ABSTRACT

Technology scaling has arguably been the most important contributor to the increase in the computational power in microprocessors and indirectly affects many aspects of modern life. One of the most important challenges facing continued technology scaling is the increasing variability in device characteristics, i.e., process variations. Process variations manifest themselves as fluctuations in performance, power, and reliability of manufactured processors. To mitigate these effects, this project develops variation-tolerant architectures that utilize ?holistic optimizations?, i.e., this project studies techniques that simultaneously consider various computational abstraction levels. The two intriguing examples of such techniques are the consideration of economical incentives to measure the quality of architectures and optimizing for user-perceived performance to increase lifetime reliability. In addition, this project studies the development of understudy components and modular architectures. Understudy components are structures that can replace parts of the chips that fail, whereas modular architectures are processors built using a small variety of similar structures that can be dynamically allocated. These research tasks have the potential of having a large impact on both academic and industrial research. Process variations cause immense problems to all processor manufacturers and their rising impact should be understood in detail. In addition, there is a growing need to understand how different architectures affect the overall targets of the processor manufacturers as well as the users of these processors. Such an understanding, which will be developed in this project, can then be used to architect much more efficient processors.

Summary

Technology scaling (i.e., the continuous decrease in device dimensions) has been a highly successful process for the development of silicon technology for the past four decades. Technology scaling is arguably the most important contributor to the increase in the computational power in the microprocessors and indirectly affects many aspects of modern life. However, as the silicon industry moves into smaller technologies, several physical phenomena are becoming dominant. Among these, two most important challenges facing Moore’s Law and continued technology scaling are the growing standby power dissipation and the increasing variability in device characteristics. This project targets the problem of variability in device characteristics, i.e., process variations. Process variations manifest themselves as fluctuations in performance, power, and reliability of manufactured processors. These effects, in turn, cause important problems for processor manufacturers (such as reduced chip yields, increased design times, reduced reliability, etc.). To address these problems, we develop variation-tolerant architectures that a) will be tolerant to failures/variations and b) have improved lifetime reliability. We also incorporate novel holistic optimization goals for such architectures: we argue how economical incentives and user-perceived performance should be taken into consideration while developing such architectures. Specifically, we aim to achieve the following research goals:

1. Measuring economical impacts of architectural decisions,

2. Increasing lifetime reliability through optimizing for user-perceived performance,

3. Understanding the effects of process variations on the behavior of representative architectures,

4. Minimizing the negative effects of variations on single and multicore processors through the development of novel variation-tolerant architectures, and

5. Understanding and minimizing the impact of variations on 3D integration.

To achieve these goals, we conduct research in the following areas: (1) Comparison metrics based on economical incentives. As the impact of process variations increases, traditional performance metrics such as instructions-per-cycle (IPC) are becoming insufficient in capturing the complete range of impacts of architectural configurations. Therefore, there is a growing need to replace/augment these metrics with new ones that take process variations into consideration. In this project, we show that economical incentives (such as profit estimation) can be used for this purpose. (2) Methods to incorporate user-perceived performance to increase lifetime reliability. Traditionally, different architectures are compared using metrics such as IPC or benchmark ratings (e.g., SPEC rating). In this project, we argue that the performance observed by the user, i.e., user-perceived performance, should be taken into account to make architectural decisions. We show that by optimizing for the user-perceived performance, lifetime reliability of processors can be increased. (3) Development of models. Although there exist various models that can estimate the impact of variations on the circuits, their effects on the architectures are not explored in detail. Hence, we develop models to understand the impact of process variations on the performance and power behavior of architectures. (4) Innovative variation-tolerant architectures. We develop understudy components, i.e., structures that can replace parts of the chips that fail. We also investigate modular architectures, i.e., architectures that are built using a small variety of similar structures that can be dynamically allocated. (5) Variation models for 3D integration and mitigating the effects of variations on 3D chips. Despite various research activities on 3D integration, the impact of process variations on 3D chips has not been explored. We first construct models to understand such impacts and then develop methods to mitigate them. These methods include post-manufacture addition of variation-aware layers, division of critical paths into multiple layers to improve the performance distribution, and asynchronous layers to isolate variation-induced problems.

This work is supported by

National Science Foundation Grant #CCF-0747201 (Program Officer: Chitaranjan Das)