For the past half century, Moore's Law has been the fundamental driver of
high-performance computing. The continued CMOS technology scaling doubles
the transistor density of VLSI systems and has provided a predictable
40% performance improvement of single-core processors for every 18 to 24
months. However, as Dennard Scaling ends, the era of scaling frequency
and performance without increasing power density is over. Since 2005,
the semiconductor industry shifted to multi-core and many-core processors
in order to sustain the proportional scaling of performance along with
transistor count increases. One of the critical challenges for many-core
system design is to reduce the power dissipation and improve the energy
efficiency of the chip. Researchers are eager to seek innovative low power
architectures and techniques to relieve the "dark silicon" problem and
effectively convert transistors to performance.
To demonstrate that many-core processors with network-on-chip interconnects
are a promising architecture for high-performance energy-efficient computing,
16 Advanced Encryption Standard (AES) engines are proposed on a fine-grained
many-core system by exploring different granularities of data-level and
task-level parallelism. The smallest design utilizes only six cores for
offline key expansion and eight cores for online key expansion, while
the largest requires 107 cores and 137 cores, respectively. In comparison
with published AES cipher implementations on general purpose processors,
the designs have 3.5–15.6 times higher throughput per unit of chip
area and 8.2–18.1 times higher energy efficiency. Moreover, the
design shows 2.0 times higher throughput than the TI DSP C6201, and 3.3
times higher throughput per unit of chip area and 2.9 times higher energy
efficiency than the GeForce 8800 GTX.
Next, a scalable joint local and global dynamic voltage and frequency
scaling (DVFS) scheme is proposed to further improve the energy efficiency
for many-core systems by monitoring on-line workload variations. The local
algorithms selects the voltage and frequency pair for each individual core
based on its FIFO occupancy and stall information, while the global algorithm
tunes the global voltage supplies based on the workload of all active
processors. To demonstrate the effectiveness of the proposed solution,
a suite of benchmarks are tested on a many-core globally asynchronous
locally synchronous (GALS) platform. The experiment results show that the
proposed approach can achieve near-optimal power saving under performance
constraints. Different local algorithms are compared in terms of power saving,
voltage switching frequency and response delay to workload variation. The
impact of the number of voltage supplies and global voltage tuning resolution
on the global algorithm is also investigated.
To further improve the energy efficiency beyond traditional DVFS, core
scaling is proposed by introducing an extra dimension beyond supply
voltage and clock frequency scaling. This dissertation addresses the
problem of minimizing the power dissipation of many-core systems under
performance constraints by choosing an appropriate number of active cores
and per-core voltage/frequency levels. A genetic algorithm based solution
is proposed to solve the problem. Experiments with real applications show
that (1) dynamically scaling the number of active cores can improve the
energy efficiency by 5% to 42% compared with per-core DVFS for different
performance requirements; (2) core scaling favors systems with more global
voltage supplies and high-performance leaky process when the performance
requirement is loose, while it favors systems with fewer global voltage
supplies and low-power less-leaky process when the performance requirement
is tight; (3) increasing the number of global voltage supplies or leakage
ratio can reduce the optimal core count by 22% and 50%, respectively.
Bin Liu, "Energy-Efficient Computing with Fine-Grained Many-Core Systems," Ph.D. Dissertation, Technical Report ECE-VCL-2016-1, VLSI Computation Laboratory, ECE Department, University of California, Davis, 2016.
@phdthesis{bliu:vcl:phdthesis, author = {Bin Liu}, title = {Energy-Efficient Computing with Fine-Grained Many-Core Systems}, school = {University of California, Davis}, year = 2016, address = {Davis, CA, USA}, month = sep, note = {\url{http://vcl.ece.ucdavis.edu/pubs/theses/2016-1/}} }