

| Outline                                                                                                                                                                                                                                                                     |   |  |  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|--|--|--|
| <ul> <li>Introduction and Motivation <ul> <li>Core Scaling Trend</li> </ul> </li> <li>KiloCore Architectural Design</li> <li>Physical Chip Design <ul> <li>Chip Flow</li> <li>Design Challenges</li> </ul> </li> <li>Final Die</li> <li>Application Measurements</li> </ul> |   |  |  |  |
|                                                                                                                                                                                                                                                                             | 2 |  |  |  |









## **KiloCore Design**

- Contains exactly 1,000 processors
   on one chip
- One of the first fabricated chips to contain 1,000 processors
- Fastest clock rate processor designed at a university
- Didn't receive all libraries until 34 days before taping out
- 12 memories containing 64 KB each for 768 KB of shared memory
  - Memories are accessible by two processors directly above each



#### **GALS Clocking** KiloCore contains a fully-independent Globally-Asynchronous Locally-Synchronous (GALS) clock domain in each of its 1000 processors, 1000 packet routers, and 12 independent memories - Processor programmable clock oscillators are ~1% of tile area - Router oscillators are simplified and very small Each of the 2012 clock oscillators are placed inside **Processor** their own clock domains—there are no global clock signals (except three for configuration and testing) Each clock oscillator is fully-unconstrained—oscillators Router may change frequency (below their $f_{max}$ ), halt, and restart arbitrarily to minimize power consumption Data transfer across clock domains is handled by dual-clock FIFOs 8



| Outline                                                                                                                                                                                                                                                                                 |    |  |  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|--|--|--|
| <ul> <li>Introduction and Motivation <ul> <li>Core Scaling Trend</li> </ul> </li> <li>KiloCore Architectural Design</li> <li>Physical Chip Design <ul> <li>Chip Flow</li> <li>Design Challenges</li> </ul> </li> <li>Final Die <ul> <li>Application Measurements</li> </ul> </li> </ul> |    |  |  |  |
|                                                                                                                                                                                                                                                                                         | 10 |  |  |  |





### **Processor Tile Implementation**

- Relatively simple processor design

   No hard macros
  - Osc was pre-placed and routed
  - I/O pin locations specified for optimal abutment
  - Power rails specified
- Quick design iterations
  - Easily changed design aspects such as memory size or tile size
  - Verilog was changing up until 5 days before tapeout
- ~580k transistors per processor tile
- 239 µm "wide" × 232.3 µm "tall"



| Metal Plan                                                       |                 |                 |                  |                |                |         |
|------------------------------------------------------------------|-----------------|-----------------|------------------|----------------|----------------|---------|
|                                                                  | Tile Metal Plan |                 |                  | Glo            | obal Metal Pla | ın      |
|                                                                  | Std. Cells      | Inter-Std. Cell | Local Power Grid | Global Signals | Power Grid     | IO Pads |
| m11                                                              |                 | Signals         |                  | (0.13%)        |                |         |
| m10                                                              |                 | 1               |                  | (2.50%)        |                |         |
| m9                                                               |                 | (2.31%)         |                  | (18.63%)       |                |         |
| m8                                                               |                 | (4.94%)         |                  | (6.52%)        |                |         |
| m7                                                               |                 | (7.02%)         |                  | (21.49%)       |                |         |
| m6                                                               |                 | (7.53%)         | i l              | (7.27%)        |                |         |
| m5                                                               |                 | (18.32%)        |                  | (15.40%)       |                |         |
| m4                                                               |                 | (24.71%)        | 1                | (13.56%)       |                |         |
| m3                                                               |                 | (22.12%)        |                  | (6.07%)        |                |         |
| m2                                                               |                 | (12.83%)        |                  | (8.42%)        |                |         |
| m1                                                               |                 | (0.21%)         |                  |                |                |         |
| Signal percentages are length per layer, per total signal length |                 |                 |                  |                |                |         |

#### 7























| Outline                                                                                                                                                                                                                                                                     |    |  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|--|--|
| <ul> <li>Introduction and Motivation <ul> <li>Core Scaling Trend</li> </ul> </li> <li>KiloCore Architectural Design</li> <li>Physical Chip Design <ul> <li>Chip Flow</li> <li>Design Challenges</li> </ul> </li> <li>Final Die</li> <li>Application Measurements</li> </ul> |    |  |  |
|                                                                                                                                                                                                                                                                             | 26 |  |  |





# Acknowledgments

| <ul> <li>Funding and Support</li> </ul>  |                             |
|------------------------------------------|-----------------------------|
| – DoD and ARL/ARO Grant W911NF-13-1-0090 | -ST Microelectronics        |
| – TAPO                                   | -C2S2                       |
| -NSF CAREER award 546907                 | -Intel Corporation          |
| CCF Grant No. 430090                     | -UCD Faculty Research Grant |
| CCF Grant No. 903549                     | -MOSIS                      |
| CCF Grant No. 1018972                    | – Artisan                   |
| CCF Grant No. 1321163                    |                             |
| – SRC GRC Grant 1598                     |                             |
| CSR Grant 1659                           |                             |
| GRC Grant 1971                           |                             |
| GRC Grant 2321                           |                             |
|                                          |                             |
|                                          | 29                          |