Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput Trade-offs

Jon J. Pimentel
Brent Bohnenstiehl
Bevan M. Baas

VLSI Computation Laboratory
Department of Electrical and Computer Engineering
University of California, Davis

Abstract:

Hybrid floating-point (FP) implementations improve software FP performance without incurring the area overhead of full hardware FP units. The proposed implementations are synthesized in 65 nm CMOS and integrated into small fixed-point processors with a RISC-like architecture. Unsigned, shift carry, and leading zero detection (USL) support is added to a processor to augment an existing instruction set architecture and increase FP throughput with little area overhead. The hybrid implementations with USL support increase software FP throughput per core by 2.18× for addition/subtraction, 1.29× for multiplication, 3.07–4.05× for division, and 3.11–3.81× for square root, and use 90.7–94.6% less area than dedicated fused multiply-add (FMA) hardware. Hybrid implementations with custom FP-specific hardware increase throughput per core over a fixed-point software kernel by 3.69–7.28× for addition/subtraction, 1.22–2.03× for multiplication, 14.4× for division, and 31.9× for square root, and use 77.3–97.0% less area than dedicated FMA hardware. The circuit area and throughput are found for 38 multiply-add, 8 addition/subtraction, 6 multiplication, 45 division, and 45 square root designs. Thirty-three multiply-add implementations are presented, which improve throughput per core versus a fixed-point software implementation by 1.11–15.9× and use 38.2–95.3% less area than dedicated FMA hardware.

Paper

PDF (3.4 MB)

page

pdf

Reference

Jon J. Pimentel, Brent Bohnenstiehl and Bevan M. Baas, "Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput Tradeoffs," IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 25, no. 1, pp. 100-113, January 2017. (Official date of publication July 12, 2016).

Note on Publication Date

Although the official publication date of this paper is July 12, 2016, it did not appear in print until the January 2017 issue of the IEEE Transactions on Very Large Scale Integration Systems (TVLSI).

"Manuscript received October 21, 2015; revised February 20, 2016 and April 28, 2016; accepted June 7, 2016. Date of publication July 12, 2016; date of current version December 26, 2016"

BibTeX Entry

@article{Pimentel:TVLSI:2016,
   author    = {Jon J. Pimentel, Brent Bohnenstiehl and Bevan M. Baas},
   title     = {Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput Tradeoffs},
   journal   = {{IEEE} Transactions on Very Large Scale Integration Systems ({TVLSI})}, 
   month     = jan,
   year      = 2017,
   volume    = 25,
   number    = 1,
   pages     = 100-113,
   note      = {Official date of publication July 12, 2016}
   }

VCL Lab | ECE Dept. | UC Davis

Last update: December 30, 2016