Programmable Processors Designed in Academia

This page contains four tables listing key attributes of programmable processors designed in academia, such as clock rate, power, and die size. These tables include both general and special purpose processors designed in universities and co-designed by universities and companies or research labs, respectively.

The table is Sortable by clicking a column heading in the top row. Clicking once, the table will be sorted from low to high, and clicking twice, the table will be sorted from high to low.



General-Purpose Programmable Processors Designed in Universities

  Year   Processor Clock Rate
(MHz)
CMOS Tech
(nm)
 Die Size 
(mm^2)
Die Size Scaled to 22nm
(mm^2)*
Voltage
(V)
Per Core Power
(mW)
Energy University, P.I. Reference
1999 PipeRench 120 180 55.48 0.666 1.8 675.0 mW @1.8V, 33MHz - CMU, Schmit [1]  [2]
2000 Pleiades(reconfigurable) 40 250 5.07 ? 0.03154 ? † - 18.04 ? mW - UC Berkeley, Rabaey [3]
2000 FLOVA 100 350 100 0.3174 † 3.3 - - KAIST, Kyung [4]  [5]
2000 Dynamic Voltage Processor  . - 600 67.5 0.0729 † 1.2 476 mW   # 38.2 pJ/instruction UC Berkeley, Brodersen [6]
2002 VIRAM 200 180 270 3.24 1.2 2000 mW 31.25 pJ/operation UC Berkeley, Patterson [7] [8]
2002 Low Power RISC Processor - 600 4.2 core 0.0045 † 0.5 - - Univ. of Tokyo, Sakurai [9]
2003 RAW - 16 cores 425 180 331.24 3.975 - 1562.5 mW  ** - MIT, Agarwal [10]
2003 Processor and Configurable Logic 100 350 64 0.2031 † 3.3 370.0 mW @3.3V - KAIST, Park [11]
2004 HiBRID-SoC 145 180 82 0.984 - 3500 mW - Univ. of Hannover, Pirsch [12]
2005 SoC with Reconfig. I/O Module 166 130 42.88 1.501 1.2 340 mW - University of Bologna, Guerrieri [13]
2006 AsAP 1 - 36 Cores 600 180 32.1 0.3852 2.0 2.4mW @0.9V, 116MHz
32 mW @1.8V, 475MHz
93.0 pJ/Op = 0.093 mW/MHz
300 pJ/Op = 0.3 mW/MHz @1.8V
UC Davis, Baas [14]
2008 AsAP 2 - 167 Cores 1200 65 39.44 5.916 1.3 0.608 mW @0.675V, 66MHz
3.4 mW @0.75V, 260MHz
47 mW @1.2V, 1.06GHz
62 mW @1.3V, 1.2GHz
5.9 pJ/Op [email protected],1GHz
32 pJ/Op = 0.032mW/MHz,
100% active
UC Davis, Baas [19]
2008 Multi-Core Stream Processor 200 90 - - 1 26 mW 1.625 pJ/operation National Taiwan Univ., Chen [20]
2008 Phoenix 0.106 ? 180 0.837 0.01 0.5 0.0002968 mW @0.5V, 106KHz 2.8 pJ/cycle University of Michigan, Sylvester [21]
2008 Sub Vt Microcontroller 1.0 65 4.2594 0.639 0.3-0.6 0.0118048 mW @0.5V, 434KHz  § 27.2 pJ/cycle @500mV MIT, Chandrakasan [23]
2010 ASPA2 75 180 17.6256 Δ 0.212 1.8 41.4 mW @1.8V, 75MHz 2.68 pJ/operation University of Manchester [32]
2011 3D-Maps - 64 Cores 277 130 25 0.875 1.5 62.5 mW @1.5V, 277MHz - Georgia Tech, Lee [26] [27]
2011 ePUMA - 9 Cores - 65 23 3.45 - 444.44 mW - Linköping University, Kesslerm [28] [29]
2012 16-Core Processor with Message-Passing 800 65 9.1 1.365 1.2 34.0 mW @1.2V, 750MHz 45.0 pJ/operation
= 0.045mW/[email protected]
Fudan University [33]
2016 KiloCore 1782 [email protected] 32 64 31.36 1.1 0.67 mW @0.56V, 115MHz 5.8 pJ/Op @0.56V, 115MHz UC Davis, Baas [44]

Notes:

     *     The die size is scaled to 22nm CMOS Technology.

     †     The area is first scaled to 180 nm CMOS Tech using simple geometry scaling. The scale factor is 1/S2, where S is the ratio of the all transistor geometry between two transistor sizes. The next step is to scale the area to 22 nm using the scale factor specified in ** , which is 0.012 from 180 nm to 22 nm.

     ‡     Have an ARM8 in the core.

     #     Throughput: 6-85 MIPS; 0.54-5.6 mW/MIP; calculated the highest power by 85*5.6=476 mW.

     **   Total power 25,000 mW, 16 cores. Power calculated by 25,000/16.

     §     27.2 pJ/cycle at Vdd=500 mV, power calculated by 27.2*434 KHz.

     Δ     Single cell area: 51×54 um2, Die size calculated by 51×54×80×80×10-6.



Special-Purpose Programmable Processors Designed in Universities

 Year  Processor Number of Cores Clock Rate
(MHz)
CMOS Tech
(nm)
Die Size
(mm^2)
Die Size Scaled to 22nm
(mm^2)*
Voltage
(V)
Per Core Power
(mW)
Energy Application University, P.I. Reference
2006 SAMIRA 1 ? 212 130 2.4 0.084 - 360.4 mW    1.7 mW/MHz Base-band signal processing applications Dresden Univ. of Tech, Fettweis [15]
2008 NoC-Based Parallel Processor 67 200 130 36 1.26 1.2 8.7 mW @1.2 V, 200 MHz 4.664 pJ/operation Real-time Object Recognition KAIST, Yoo [22]
2008 Sub Vt Sensor Processor 1 0.833 130 0.029768 core
0.055205 Memory
0.001 0.2-1.2 0.0006 mW   # 2.6pJ/instruction
@360mV, 833KHz
Sensor Applications University of Michigan, Sylvester [24]
2008 Fully Programmable 3-D Graphics Processor 1 100 130 9.3 0.3255 1.2 195.0 mW @1.2V, 100MHz
Full 3-D Graphics Processing
- Low-Power Mobile Devices KAIST, Hoi-Jun Yoo [34]
2009 Real-Time Multi-Object
Recognition Processor
18 400 130 49 1.715 1.2 27.556 mW @1.2V ** 2.463 pJ/operation Multi-Object Recognition KAIST, Hoi-Jun Yoo [35]
2009 Programmable Baseband Processor 1 240 120 11 0.506 - 70.0 mW @70MHz    - Mobile WiMAX and DVB-T/H Linkoping University [36]
2010 Heterogeneous Many-Core Processor 33 200 130 50 1.75 1.2 10.455 mW  ** 4.065 pJ/operation Object Recognition KAIST, Hoi-Jun Yoo [37]
2011 Heterogeneous Multimedia Processor 1 200 130 16 0.56 1.2 275 mW 7.161 pJ/operation Multimedia Applications KAIST, Kim [25]
2013 24-core processor for multi-media
and communication applications
24 850 65 18.8 2.82 1.2 22 mW  ** 25.641 pJ/operation Multi-media and communication applications Fudan University [38]
2013 Multi-Classifier Many-Core Processor 21 200 § 130 25 0.875 0.65-1.2 12.381 mW  ** 1.548 pJ/operation Object Recognition KAIST, Hoi-Jun Yoo [39]
2014 EEG Neuro-Feedback Processor 1 20 130 11.75 0.4113 0.7-1.0 4.45 mW    - Mental-health Management KAIST [40]
2014 Augmented Reality Multicore Processor 1 250 65 32 4.8 0.7-1.2 381.0 mW @1.2V, 250MHz 1.52 mW/Hz Head-mounted display application KAIST, Hoi-Jun Yoo [41]

Notes:

     *: The die size is scaled to 22nm CMOS Technology.

     †: power calculated by multiplying 1.7mW/MHz and 212MHz.

     ‡: 1 main processor, 64 process elements, 1 VAE, 1 matching accerator.

     #: 0.85 pJ/instruction at 0.04 MIPS and 1.2 pJ at 0.5 MIPS. calculated from 1.2pJ*0.5 MIPS.

     **: Power calculated from total power dividing number of cores.

     §: DVFS: clock: 50-200 MHz.



General-Purpose Programmable Processors Co-Designed by Universities and Companies or Research Labs

 Year  Processor Clock Rate
(MHz)
CMOS Tech
(nm)
 Die Size 
(mm^2)
Die Size Scaled to 22nm
(mm^2)*
Voltage
(V)
Per Processor Power
(mW)
Energy University, P.I. Cooperator Reference
2001 Imagine 400 150 260 6.708 1.5 - - Stanford, Dally Texas Instruments [30]
2006 Razor 200 180 9.9 0.119 1.2 - 1.8 425 mW @1.8V,200MHz - University of Michigan, Mudge ARM [18]
2006 TRIPS -2 Cores 366 130 334 11.69 - - - UT Austin, Burger IBM, Intel, and Sun Microsystems [16] [17]
2013 Quality Programmable Vector Proce-
ssors for Approximate Computing 
250 45 2.6 0.598 - 1.27266 mW - Purdue University, Raghunathan NEC Laboratories America [31]

Notes:

     *: The die size is scaled to 22nm CMOS Technology.



Special-Purpose Programmable Processors Co-Designed by Universities and Companies or Research Labs

 Year  Processor Clock Rate
(MHz)
CMOS Tech
(nm)
 Die Size 
(mm^2)
Die Size Scaled to 22nm
(mm^2)*
Voltage
(V)
Per Processor Power
(mW)
Energy Application University, P.I. Cooperator Reference
2010 Dynamically programmable image processor 40 180 25 0.3 1.8 1000 mW @200MHz - compact vision systems University of Michigan, Mudge Vision & Control GmbH [42]

Notes:

     *: The die size is scaled to 22nm CMOS Technology.



     Scale Factors for Scaling Die Size to 22nm CMOS Technology

CMOS Tech (nm) 180 150 130 120 90 65 55 45 40 32 28 22
Scale Factor 0.012 0.026 * 0.035 0.046 * 0.08 0.15 0.19 * 0.23 0.33 * 0.49 0.694 * 1

Notes:

     The data of this table come from Table VII of [43], these scale factors are formed by using Geometric Means of Three Aspects: Minimum Feature Size, Metal I half pitch, (4T) Logic Gate Size.

     The scale factor followed by an '*' signifies it is derived from original data by linear interpolation.

     The CMOS technology that is larger than 180 nm, such as 250 nm, 600 nm, is defined not scalable in this context, since the linear interpolation will lead to negative scale factor.



References

[1] Schmit, Herman, et al. "PipeRench: A virtualized programmable datapath in 0.18 micron technology." Custom Integrated Circuits Conference, 2002. Proceedings of the IEEE 2002. IEEE, 2002.

[2] https://www.ece.cmu.edu/research/piperench/index.html.

[3] Wan, Marlene, et al. "Design methodology of a low-energy reconfigurable single-chip DSP system." Journal of VLSI signal processing systems for signal, image and video technology 28.1-2 (2001): 47-61.

[4] Youn, Daehan, Ohyoung Song, and Hoon Chang. "Design-for-testability of the FLOVA." Proceedings of the Second IEEE Asia Pacific Conference. 2000.

[5] http://vswww.kaist.ac.kr/KOREAN/research_project_FLOVA.php.

[6] Burd, Thomas D., et al. "A dynamic voltage scaled microprocessor system." Solid-State Circuits, IEEE Journal of 35.11 (2000): 1571-1580.

[7] Kozyrakis, Christoforos, et al. "Hardware/compiler codevelopment for an embedded media processor." Proceedings of the IEEE 89.11 (2001): 1694-1709.

[8] http://iram.cs.berkeley.edu/ .

[9] Nose, Koichi, et al. "V TH-hopping scheme to reduce subthreshold leakage for low-power processors." Solid-State Circuits, IEEE Journal of 37.3 (2002): 413-419.

[10] Taylor, Michael Bedford, et al. "The Raw microprocessor: A computational fabric for software circuits and general-purpose programs." Micro, IEEE 22.2 (2002): 25-35.

[11] Bae, Young-Don, Seong-Il Park, and In-Cheol Park. "A single-chip programmable platform based on a multithreaded processor and configurable logic clusters." Solid-State Circuits, IEEE Journal of 38.10 (2003): 1703-1711.

[12] Stolberg, H-J., et al. "HiBRID-SoC: A multi-core SoC architecture for multimedia signal processing." Signal Processing Systems, 2003. SIPS 2003. IEEE Workshop on. IEEE, 2003.

[13] Bocchi, Massimo, et al. "Design and implementation of a reconfigurable heterogeneous multiprocessor SoC." Custom Integrated Circuits Conference, 2006. CICC'06. IEEE. IEEE, 2006.

[14] Yu, Zhiyi, et al. "AsAP: An asynchronous array of simple processors." Solid-State Circuits, IEEE Journal of 43.3 (2008): 695-705.

[15] Matus, Emil, et al. "A GFLOPS vector-DSP for broadband wireless applications." Custom Integrated Circuits Conference, 2006. CICC'06. IEEE. IEEE, 2006.

[16] http://www.cs.utexas.edu/users/cart/trips/publications/micro06_trips.pdf.

[17] Gratz, Paul, et al. "On-chip interconnection networks of the TRIPS chip." Micro, IEEE 27.5 (2007): 41-50.

[18] Ernst, Dan, et al. "Razor: A low-power pipeline based on circuit-level timing speculation." Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on. IEEE, 2003.

[19] Truong, Dean, et al. "A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling." Symposium on VLSI Circuits. 2008.

[20] Tsao, You-Ming, et al. "A 26 mW 6.4 GFLOPS multi-core stream processor for mobile multimedia applications." Dig. Tech. Papers Symp. VLSI Circuits. 2008.

[21] Seok, Mingoo, et al. "The Phoenix Processor: A 30pW platform for sensor applications." VLSI Circuits, 2008 IEEE Symposium on. IEEE, 2008.

[22] Kim, Kwanho, et al. "A 125 GOPS 583 mW network-on-chip based parallel processor with bio-inspired visual attention engine." Solid-State Circuits, IEEE Journal of 44.1 (2009): 136-147.

[23] J. Kwong et al., "A 65nm Sub-Vt Microcontroller with Integrated SRAM and Switched-Capacitor DC-DCConverter," IEEE ISSCC, pp. 318-319 , Feb. 2008.

[24] Zhai, Bo, et al. "Energy-efficient subthreshold processor design." Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 17.8 (2009): 1127-1137.

[25] Kim, Hyo-Eun, et al. "A reconfigurable heterogeneous multimedia processor for IC-stacking on Si-interposer." Circuits and Systems for Video Technology, IEEE Transactions on 22.4 (2012): 589-604.

[26] Healy, Michael B., et al. "Design and analysis of 3D-MAPS: A many-core 3D processor with stacked memory." CICC. 2010.

[27] http://www.gtcad.gatech.edu/3d-maps/

[28] Wang, Jian, et al. "ePUMA: A novel embedded parallel DSP platform for predictable computing." Education Technology and Computer (ICETC), 2010 2nd International Conference on. Vol. 5. IEEE, 2010.

[29] Wang, Jian, Joar Sohl, and Dake Liu. "Architectural support for reducing parallel processing overhead in an embedded multiprocessor." Embedded and Ubiquitous Computing (EUC), 2010 IEEE/IFIP 8th International Conference on. IEEE, 2010.

[30] Kapasi, Ujval J., et al. "The Imagine stream processor." Computer Design: VLSI in Computers and Processors, 2002. Proceedings. 2002 IEEE International Conference on. IEEE, 2002.

[31] Venkataramani, Swagath, et al. "Quality programmable vector processors for approximate computing." Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2013.

[32] Lopich, Alexey, and Piotr Dudek. "An 80×80 general-purpose digital vision chip in 0.18um CMOS technology." Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on. IEEE, 2010.

[33] Yu, Zhiyi, et al. "An 800MHz 320mW 16-core processor with message-passing and shared-memory inter-core communication mechanisms." Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International. IEEE, 2012.

[34] Woo, Jeong-Ho, et al. "A 195 mW, 9.1 MVertices/s fully programmable 3-D graphics processor for low-power mobile devices." Solid-State Circuits, IEEE Journal of 43.11 (2008): 2370-2380.

[35] Kim, Joo-Young, et al. "A 201.4 GOPS 496 mW real-time multi-object recognition processor with bio-inspired neural perception engine." Solid-State Circuits, IEEE Journal of 45.1 (2010): 32-45.

[36] Nilsson, Anders, Eric Tell, and Dake Liu. "An 11 mm, 70 mW fully programmable baseband processor for mobile WiMAX and DVB-T/H in 0.12 m CMOS." Solid-State Circuits, IEEE Journal of 44.1 (2009): 90-97.

[37] Lee, Seungjin, et al. "A 345 mW heterogeneous many-core processor with an intelligent inference engine for robust object recognition." Solid-State Circuits, IEEE Journal of 46.1 (2011): 42-51.

[38] Ou, Peng, et al. "A 65nm 39GOPS/W 24-core processor with 11Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array." Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE, 2013.

[39] Park, Junyoung, et al. "A 646GOPS/W multi-classifier many-core processor with cortex-like architecture for super-resolution recognition." Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE, 2013.

[40] Roh, Taehwan, et al. "18.5 A 2.14 mW EEG neuro-feedback processor with transcranial electrical stimulation for mental-health management." Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International. IEEE, 2014.

[41] Kim, Gyeonghoon, et al. "A 1.22 TOPS and 1.52 mW/MHz Augmented Reality Multicore Processor With Neural Network NoC for HMD Applications." Solid-State Circuits, IEEE Journal of 50.1 (2015): 113-124.

[42] Loos, Andreas, et al. "Dynamically programmable image processor for compact vision systems." Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 2010.

[43] Stillmaker, Aaron, Zhibin Xiao, and Bevan Baas. "Toward more accurate scaling estimates of cmos circuits from 180 nm to 22 nm." VLSI Computation Lab, ECE Department, University of California, Davis, Tech. Rep. ECE-VCL-2011-4 (2011): 2011-4.

[44] Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo and Bevan Baas,"A 5.8 pJ/Op 115 Billion Ops/sec, to 1.78 Trillion Ops/sec 32nm 1000-Processor Array," IEEE Symposium on VLSI Circuits, Honolulu, HI, June 2016.



VCL | ECE Dept. | UC Davis

This page is maintained by members of the VLSI Computation Laboratory at UC Davis. Please write any one of us with corrections or additions; thank you.

Updates:
2016/06/19 Minor updates
2019/05/09 Minor updates