This dissertation investigates the architectural design, physical implementation, result evaluation, and feature analysis of a multi-core processor for DSP applications. The system is composed of a 2-D array of simple single-issue programmable processors interconnected by a reconfigurable mesh network, and processors operate completely asynchronously with respect to each other in a Globally Asynchronous Locally Synchronous fashion. The processor is called Asynchronous Array of simple Processors (AsAP). A 6 x 6 array has been fabricated in a 0.18 μm CMOS technology. The physical design concerns timing issues for robust implementations, and takes full advantages of their potential scalability. Each processor occupies 0.66 mm2, is fully functional at a clock rate of 520 to 540 MHz under 1.8 V, and dissipates 94 mW while the clock is 100% active. Compared to the high performance TI C62x DSP processor, AsAP achieves performance 0.8 to 9.6 times greater, energy efficiency 10 to 75 times greater, with an area 7 to 19 times smaller. The system is also easily scalable, and is well-suited to future fabrication technologies.
An asymmetric interprocessor communication architecture is proposed. It assigns different buffer resources to the nearest neighbor interconnect and the long distance interconnect, can reduce the communication circuitry area by approximately 2 to 4 times compared to the traditional Network on Chip (NoC), with similar routing capability. A wide design exploration space is investigated, including supporting long distance communication in GALS systems, static/dynamic routing, varying numbers of ports (buffers) for the processing core, and varying numbers of links at each edge.
The use of GALS style typically introduces performance penalties due to additional communication latency between clock domains. GALS chip multiprocessors with large inter-processor FIFOs as AsAP can inherently hide much of the GALS performance penalty, and the penalty can even be driven to zero. Furthermore, adaptive clock and voltage scaling for each processor provides an approximately 40% power savings without any performance reduction.
Zhiyi Yu, "High Performance and Energy Efficient Multi-core Systems for DSP Applications," Technical Report ECE-CE-2007-5, Computer Engineering Research Laboratory, ECE Department, University of California, Davis, 2007.
@phdthesis{zhyyu:phdthesis, author = {Zhiyi Yu}, title = {High Performance and Energy Efficient Multi-core Systems for DSP Applications}, school = {University of California}, year = 2007, address = {Davis, CA, USA}, month = Oct, note = {\url{http://www.ece.ucdavis.edu/vcl/pubs/theses/2007-5}} }