A Complete Real-Time 802.11a Baseband Receiver Implemented on an Array of Programmable Processors

ACSSC 2008 – Pacific Grove, CA

Anh Tran, Dean Truong and Bevan Baas

VLSI Computation Lab, ECE Department, University of California - Davis

- Architecture of a 802.11a Digital Baseband Receiver
- The Target Many-core Computational Platform
- Implementation of the Receiver
- Results and Analysis
- Conclusion

- Architecture of a 802.11a Digital Baseband Receiver
- The Target Many-core Computational Platform
- Implementation of the Receiver
- Results and Analysis
- Conclusion

### Architecture of a Complete 802.11a Digital Baseband Receiver



- Three important features required for a practical receiver:
  - Frame detection and timing synchronization
  - Carrier frequency offset (CFO) estimation and correction
  - Channel estimation and equalization

#### Frame Detection and Timing Synchronization



Timing metric (\*): 1.4 SNR = 20 dB1.2  $M(n) = \frac{|P(n)|^2}{Q(n)^2}$ Th<sub>det</sub> 0.8 (u)M where: 0.6  $P(n) = \sum_{k=0}^{15} r(n+k+16).r^{*}(n+k)$ 0.4 Th<sub>syn</sub> 0.2  $Q(n) = \sum_{k=1}^{15} |r(n+k)|^2$ 50 100 150 200 250 -350 n index n k=0

(\*)T.M. Schmidl and D.C. Cox, "Robust frequency and timing synchronization for OFDM," *IEEE Transactions on Communications*, pp. 1613-1621, Dec. 1997

#### Frame Detection and Timing Synchronization

|  | 10 short-training symbols |      |   |   |   |      |   |   | ls |      | 2 long-training symbols<br>with GI2 |   |   |    | I SIGNAL<br>Symbol |    | I Many OFDM data |  |  |
|--|---------------------------|------|---|---|---|------|---|---|----|------|-------------------------------------|---|---|----|--------------------|----|------------------|--|--|
|  | s                         | S    | s | S | s | s    | s | S | s  | s    | GI2                                 | L | L | GI | SIGNAL             | GI | Data             |  |  |
|  |                           | 8 µs |   |   |   | 8 μs |   |   |    | 4 µs | N x 4 μs                            |   |   |    |                    |    |                  |  |  |

Frame detection:

 $M(n) > Th_{det}$ 

or:

or:

$$|P(n)|^2 > Th_{det} \cdot Q(n)^2$$

Timing synchronization:

 $M(n) < Th_{syn}$ 

 $|P(n)|^2 < Th_{syn} \cdot Q(n)^2$ 



### CFO Estimation and Compensation



CFO compensation: using CORDIC Rotation algorithm

(\*) E. Sourour et al., "Frequency offset estimation and correction in the IEEE 802.11a WLAN," *IEEE Vehicular Technology Conference*, pp. 4923-4927, Sep. 2004.

### Channel Estimation and Equalization

|   | 10 short-training symbols |   |   |      |   |   |   |      |          | 2 long-training symbols<br>with GI2 |   |   |    | I SIGNAL<br>Symbol |    | I Many OFDM data |  |  |
|---|---------------------------|---|---|------|---|---|---|------|----------|-------------------------------------|---|---|----|--------------------|----|------------------|--|--|
| s | S                         | s | s | s    | s | s | s | s    | s        | GI2                                 | L | L | GI | SIGNAL             | GI | Data             |  |  |
|   | 8 µs !                    |   |   | 8 µs |   |   |   | 4 µs | N x 4 μs |                                     |   |   |    |                    |    |                  |  |  |

- Channel coefficients:
- Channel equalization:

nts: 
$$H(k) = \frac{1}{2} \cdot \frac{\widetilde{L_{1}(k)} + \widetilde{L_{2}(k)}}{\widehat{L}(k)}$$
  
tion: 
$$\widehat{S_{m}}(k) = \frac{\widetilde{S_{m}(k)}}{H(k)}$$
$$= \widetilde{S_{m}}(k) \cdot C(k)$$
where: 
$$C(k) = \frac{1}{H(k)} = \frac{2\widehat{L}(k)}{\widetilde{L_{1}(k)} + \widetilde{L_{2}}(k)}$$

- Architecture of the 802.11a Digital Baseband Receiver
- The Target Many-core Computational Platform
- Implementation of the Receiver
- Results and Analysis
- Conclusion

### The Target Computational Platform

- Key features (\*):
  - 164 fine-grained processors
  - 3 configurable accelerators:
    - FFT, Viterbi and Motion Estimation
  - 3 big shared memory modules
  - Circuit-switched network
  - Max. frequency of 1.2 GHz at 1.3 V
  - Fabricated in ST 65 nm process



(\*) D. Truong, et at., " A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling}," *VLSI Circuits Symposium*, Jun. 2008.

- Architecture of the 802.11a Digital Baseband Receiver
- The Target Many-core Computational Platform
- Implementation of the Receiver
- Results and Analysis
- Conclusion

## Implementation of the Receiver



- Implement whole system using Matlab
- Program each function on one/many processors using the AsAP assembly language
- Map whole system on the AsAP platform
- Compare results with Matlab











- Architecture of the 802.11a Digital Baseband Receiver
- The Target Many-core Computational Platform
- Implementation of the Receiver
- Results and Analysis
- Conclusion

## Throughput Evaluation

- Processors on the critical data path determines the receiver's throughput
- Each processor operates as one stage of a pipeline
- The CORDIC Rotation processor is system bottleneck
- One OFDM symbol is processed by each processor in 15120 cycles
- To achieve 54 Mbps throughput, all processors must run at 3.78 GHz



### Throughput Improvement

Using 15 processors to pipeline the CORDIC algorithm:



each groups of 3 samples that will be rotated by 3 CORDIC processors

### Throughput Improvement

- When using 7 CORDIC processors in parallel, the Viterbi Decoder becomes bottleneck
- No further improvement is possible by software
- Time (cycles) Now, each processor processes one OFDM symbol in 2376 cycles
- The receiver obtains 54 Mbps throughput at 590 MHz



# Comparison

| Work by   | Platform      | Tech.<br>(nm) | Max<br>Freq.<br>(MHz) | Fram.<br>Det. &<br>Syn. | CFO<br>Est. &<br>Comp. | Chan.<br>Est. &<br>Eq. | Throug<br>hput<br>(Mbps) | Scaled<br>to 65<br>nm |
|-----------|---------------|---------------|-----------------------|-------------------------|------------------------|------------------------|--------------------------|-----------------------|
| Tariq     | TI 62x        | 180           | 200                   | -                       | -                      | $\checkmark$           | 1.7                      | 4.7                   |
| Bakker    | Strong<br>ARM | 350           | 130                   |                         | $\checkmark$           | -                      | 4.3                      | 23.2                  |
| Yung      | CoPro.        | 180           | 260                   | -                       | -                      | $\checkmark$           | 12                       | 33.2                  |
| Lin       | SODA          | 180           | 400                   | $\checkmark$            | -                      | $\checkmark$           | 24                       | 66.4                  |
| Sereni    | TI 64x        | 130           | 600                   | $\checkmark$            | $\checkmark$           | $\checkmark$           | 36                       | 72                    |
| Akabane   | SDR           | 90            | 280                   | $\checkmark$            | -                      | $\checkmark$           | 54                       | 74.7                  |
| this work | AsAP2         | 65            | 1200                  |                         |                        |                        | 110                      | 110                   |

- Our receiver sustains 110 Mbps throughput at max frequency of 1.2 GHz
- It is a complete one and 1.5x 23x faster than others

- Architecture of the 802.11a Digital Baseband Receiver
- The Target Many-core Computational Platform
- Implementation of the Receiver
- Results and Analysis
- Conclusion

## Summary

- Fine-grained many-core platform
  - Task-level parallelism
  - Highly flexible and scalable
  - Many ways to speedup an application
- A complete 802.11a baseband receiver
  - Supports all necessary features of a real receiver
  - Sustain real-time 54 Mbps throughput at 590 MHz
  - Can sustain up to 110 Mbps if running at maximum frequency
  - Many times faster than other related works
- Future work
  - Improve accelerators
  - Upgrade the platform for mapping more wireless applications

### Acknowledgments

- Intellasys Inc.
- a VEF fellowship
- SRC GRC Grant 1598 and CSR Grant 1659
- ST Microelectronics
- UC Micro
- NSF Grant 0430090 and CAREER Award 0546907
- Intel
- S Machines



## **THANK YOU !**

### Compute Bit Rate and Frame Length

