American Journal of Applied Sciences 8 (7): 681-684, 2011 ISSN 1546-9239 © 2011 Science Publications

# Field Programmable Gate Arrays Based Realization of Truncated Multipliers

<sup>1</sup>Muhammad H. Rais and <sup>2</sup>Mohammed H. Al Mijalli <sup>1</sup>Cornea Research Chair, College of Applied Medical Sciences, <sup>2</sup>Department Biomedical Technology, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia

**Abstract: Problem statement:** Due to high cost and non reconfiguration of Application Specific Integrated Circuits (ASICs) in image processing applications, for example MPEG video compression used in CT scan frames requires real time conditions and the algorithms should be verified and optimized before implementation. **Approach:** Field Programmable Gate Array (FPGA) provides reconfiguration and implementation at the same time. **Results:** The implementation results of truncated multipliers on Sparatn-3An FPGA showed significant improvement as compared to Virtex and Virtex-E FPGA devices. **Conclusion:** Truncated multipliers can be used in medical imaging technology such as CT scan.

Key words: Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Spartan-3AN, truncated multiplier, Virtex-E, VHDL

## INTRODUCTION

For large value operands in airthematic operations, multiplication has always been hardware-, time- and power- consuming computation. This is more pronounced in digital signal processing (DSP) applications that constitute large number of multiplications. In the DSP computational complexity of algorithms has increased to such extent that they require fast and efficient parallel multipliers (Agostini *et al.*, 2007; Gierenz *et al.*, 2010; Kong *et al.*, 2008; Zemva and Verderber, 2007; Rais, 2009a; 2009b; Rais, 2010; Rais *et al.*, 2010).

Realization of DSP algorithm requires the algorithms should be verified and optimized before implementation. For this purpose, the Field Programmable Gate Arrays (FPGAs) have emerged as a platform of choice for efficient hardware implementation and an attractive alternative to Application Specific Integrated Circuits (ASICs) (Rais, 2009a; 2009b; Rais, 2010; Rais *et al.*, 2010).

Truncated multipliers do not form all of the leastsignificant columns. The delay, area and power consumption of the arithmetic unit significantly reduced as more columns are eliminated. The basic idea of this technique is to reject some of the less significant partial products. In place of removed partial products a compensation circuit is introduced that to a certain extent compensates for the dropped terms, thus reducing approximation error. To achieve hardware efficient realization of a truncated multiplier several research efforts have been presented in literature (Rais, 2009a; 2009b; Rais, 2010).

### MATERIALS AND METHODS

**Architecture platform:** FPGAs are an ideal platform for the implementation of computationally intensive and massively parallel architecture, as they are parallel in nature and have high frequency. Here brief introductions about Spartan-3, Virtex and Virtex-E FPGAs from Xilinx are presented.

**Spartan-3 FPGAs:** The Spartan-3 FPGA is from the fifth generation of Xilinx family. Particularly, it is designed to meet the needs of high volume, low unit cost electronic systems. The family includes eight member offering densities ranging from 50,000 to five million system gates (Xilinx, 2009). The Spartan-3 FPGA consists of five fundamental programmable functional elements: Configurable logic blocks (CLBs), input/output blocks (IOBs), Block RAMs, dedicated multipliers (18×18) and Digital Clock Managers (DCMs).

Spartan-3 family includes Spartan-3L, Spartan-3E, Spartan-3A, Spartan-3A DSP, Spartan-3AN and the extended Spartan-3A FPGAs. Spartan-3AN combines all the feature of Spartan-3A FPGA family plus leading technology in-system flash memory for configuration and nonvolatile data storage.

**Corresponding Author:** Muhammad H. Rais, Cornea Research Chair, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia **Virtex FPGAs:** Virtex devices feature a flexible, regular architecture that comprises an array of CLBs, surrounded by programmable IOBs, all interconnected by a rich hierarchy of fast, versatile routing resources.

The Virtex family comprises of nine members offering densities ranging from 57,906-1,124,022 system gates (Xilinx, 2001). The abundance of routing resources permits the Virtex family to accommodate even the largest and most complex designs.

Virtex FPGAs are SRAM-based and are customized by loading configuration data into internal memory cells. In some modes, the FPGA reads its own configuration data from an external PROM (master serial mode). Virtex devices provide better performance than previous generations of FPGA. Designs can achieve synchronous system clock rates up to 200 MHz including I/O.

**Virtex-E FPGAs:** The Virtex-E FPGA family delivers high-performance high-capacity programmable logic solutions. The Virtex-E family offers up to 43,200 logic cells in devices up to 30% faster than the Virtex family.

The Virtex-E family delivers a high-speed and high-capacity programmable logic solution. The Virtex-E family comprises the eleven members offering densities ranging from 71,693-4,074,387 system gates (Xilinx, 2002).

Virtex-E devices have up to 640 Kb of faster (250 MHz) block SelectRAM, but the individual RAMs are the same size and structure as in the Virtex family. They also have eight DLLs instead of the four in Virtex devices. Each individual DLL is slightly improved with easier clock mirroring and 4x frequency multiplication. The Virtex-E devices built aggressive 6-layer metal 0.18  $\mu$ m CMOS process.

Virtex-E devices feature a flexible, regular architecture that comprises an array of CLBs surrounded by programmable IOBs, all interconnected by a rich hierarchy of fast, versatile routing resources. Virtex-E FPGAs are SRAM-based and are customized by loading configuration data into internal memory cells. Designs can achieve synchronous system clock rates up to 240 MHz including I/O or 622 Mb/s using Source Synchronous data transmission architectures.

## RESULTS

**FPGA Design and implementation:** The design of standard and truncated multipliers are done using VHDL and implemented in a Xilinx Spartan-3AN XC3S700AN (package: fgg484, speed grade: -5), Virtex XCV50 (package: fg256, speed grade: -6) and Virtex-E

XCV50E (package: fg256, speed grade: -8) FPGAs using the Xilinx ISE 9.2i design tool (Xilinx, 2007).

# DISCUSSION

Figures 1-2 show the differences in average connection delay and maximum pin delay for FPGA devices. The reduction in pin delay and the number of occupied slices used in truncated multiplier also show that it is one of the feasible solutions for medical image processing applications, such as CT scan, where most of the redundant information can be removed. Table 1-3 summarize the FPGA device resources utilization for standard and truncated multipliers. Table 4 presents the percentage change between the standard to truncated multipliers, which clearly demonstrates that the occupied slices ranges from 145-170% for Spartan-3AN, Virtex and Virtex-E FPGA devices.



Fig. 1: The average connection delay for Virtex-E, Virtex and Spartan-3AN for standard and truncated multipliers



Fig. 2: The maximum pin delay for Virtex-E, Virtex and Spartan-3AN for standard and truncated multipliers

### Am. J. Applied Sci., 8 (7): 681-684, 2011

Table 1: FPGA resource utilization for standard and truncated multiplier for Spartan-3AN (Rais, 2010) XC3S700AN (Package: fgg484, speed grade:-5)

| Bit width | multipliers | Four Input<br>LUTs<br>(11776) | Occupied<br>slices<br>(5888) | Bonded<br>IOBs (372) | Total<br>equivalent<br>gate count | Average<br>connection<br>delay (ns) | Maximum<br>pin delay (ns) |
|-----------|-------------|-------------------------------|------------------------------|----------------------|-----------------------------------|-------------------------------------|---------------------------|
| 4×4       | Standard    | 30                            | 16                           | 16                   | 180                               | 1.421                               | 3.598                     |
|           | Truncated   | 18                            | 11                           | 12                   | 111                               | 1.272                               | 2.705                     |
| 6×6       | Standard    | 67                            | 36                           | 24                   | 402                               | 1.238                               | 4.873                     |
|           | Truncated   | 43                            | 24                           | 18                   | 261                               | 1.096                               | 2.722                     |
| 8×8       | Standard    | 121                           | 62                           | 32                   | 726                               | 1.085                               | 3.968                     |
|           | Truncated   | 76                            | 40                           | 24                   | 456                               | 1.072                               | 3.641                     |
| 12×12     | Standard    | 289                           | 148                          | 48                   | 1734                              | 1.079                               | 3.766                     |
|           | Truncated   | 164                           | 87                           | 36                   | 984                               | 1.307                               | 3.971                     |

Table 2: FPGA resource utilization for standard and truncated multiplier for Virtex XCV50 (Package: fg256, speed grade:-6)

| <b>D</b> 1 11 |             | Four Input<br>LUTs | Occupied slices | Bonded         | Total<br>equivalent | Average connection | Maximum        |
|---------------|-------------|--------------------|-----------------|----------------|---------------------|--------------------|----------------|
| Bit width     | Multipliers | (117/6)            | (5888)          | $IOB_{s}(372)$ | gate count          | delay (ns)         | pin delay (ns) |
| 4×4           | Standard    | 30                 | 16              | 16             | 180                 | 1.343              | 2.449          |
|               | Truncated   | 18                 | 11              | 12             | 111                 | 1.389              | 3.132          |
| 6×6           | Standard    | 67                 | 36              | 24             | 402                 | 1.469              | 3.933          |
|               | Truncated   | 43                 | 24              | 18             | 261                 | 1.251              | 2.759          |
| 8×8           | Standard    | 121                | 62              | 32             | 726                 | 1.437              | 4.254          |
|               | Truncated   | 76                 | 40              | 24             | 456                 | 1.466              | 4.596          |
| 12×12         | Standard    | 289                | 148             | 48             | 1734                | 1.628              | 4.830          |
|               | Truncated   | 164                | 87              | 36             | 984                 | 1.478              | 3.460          |

Table 3: FPGA resource utilization for standard and truncated multiplier for Virtex-E XCV50E (Package: fg256, speed grade:-8)

| Bit width | Multipliers | Four Input<br>LUTs<br>(11776) | Occupied<br>slices<br>(5888) | Bonded<br>IOBs (372) | Total<br>equivalent<br>gate count | Average<br>connection<br>delay (ns) | Maximum bit<br>pin delay (ns) |
|-----------|-------------|-------------------------------|------------------------------|----------------------|-----------------------------------|-------------------------------------|-------------------------------|
| 4×4       | Standard    | 30                            | 16                           | 16                   | 180                               | 1.193                               | 2.141                         |
|           | Truncated   | 18                            | 11                           | 12                   | 111                               | 1.009                               | 2.113                         |
| 6×6       | Standard    | 67                            | 36                           | 24                   | 402                               | 1.264                               | 4.449                         |
|           | Truncated   | 43                            | 24                           | 18                   | 261                               | 1.004                               | 2.196                         |
| 8×8       | Standard    | 121                           | 62                           | 32                   | 726                               | 1.148                               | 2.775                         |
|           | Truncated   | 76                            | 40                           | 24                   | 456                               | 1.308                               | 3.437                         |
| 12×12     | Standard    | 289                           | 148                          | 48                   | 1734                              | 1.358                               | 4.361                         |
|           | Truncated   | 164                           | 87                           | 36                   | 984                               | 1.267                               | 3.644                         |

Table 4: Percentage change between the standard and truncated multiplier for spartan-3AN (Rais, 2010), Virtex and Virtex-E FPGA devices

| Bit width (Multipliers)   | Four input LUTs (11776) | Occupied slices (5888) | Total equivalent gate count |
|---------------------------|-------------------------|------------------------|-----------------------------|
| 4×4 (Standard/Truncated)  | 166.7%                  | 145.4%                 | 162.2%                      |
| 6×6 (Standard/Truncated)  | 155.8%                  | 150%                   | 154%                        |
| 8×8 (Standard/Truncated)  | 159.2%                  | 155%                   | 159.2%                      |
| 12×12 (Standard/Truncate) | 176.2%                  | 170.1%                 | 176.2%                      |

# CONCLUSION

We have presented hardware design and implementation of FPGA based parallel architecture for standard and truncated multipliers using VHDL. The design was implemented on Xilinx Spartan-3AN XC3S700AN, Virtex XCV50 and Virtex-E XCV50E FPGA devices using the ISE 9.2i design tool. The FPGA devices used almost same number of occupied slices but their average connection and maximum pin delays are different; which clearly indicates that the Spartan-3AN is better FPGA device than other Virtex and Virtex-E FPGAs. The truncated multipliers can be used in medical imaging technology, such as CT scan, due to reduced resources of FPGA and thus possibilities of utilization of real time conditions.

## ACKNOWLEDGEMENT

The researchers acknowledge the assistance and the financial support provided by the Cornea Research Chair, College of Applied Medical Sciences, King Saud University.

#### REFERENCES

- Agostini, L.V., I.S. Silva and S. Bampi, 2007. Multiplierless and fully pipelined JPEG compression soft IP targeting FPGAs. Micropro. Microsys., 31: 487-497. DOI: 10.1016/j.micpro.2006.02.002
- Gierenz, V., C. Panis and J. Nurmi, 2010. Parameterized MAC unit generation for a scalable embedded DSP core. Micropro. Microsys., 4: 138-150. DOI: 10.1109/NORCHP.2008.4738297
- Kong, M.Y., J.M.P. Langlois and D. Al-Khalili, 2008. Efficient FPGA implementation of complex multipliers using the logarithmic number system. Proceedings of the IEEE International Symposium on Circuits and Systems, May 18-21, IEEE Xplore, pp: 3154-3157. DOI: 10.1109/ISCAS.2008.4542127
- Rais, M.H., 2009a. FPGA design and implementation of fixed width standard and truncated 6×6-bit multipliers: A comparative study: In Proceedings of the 4th IEEE International Design and Test Workshop, Nov. 15-17, IEEE Xplore, Riyadh, Saudi Arabia, pp: 1-4. DOI: 10.1109/IDT.2009.5404081
- Rais, M.H., 2009b. Efficient hardware realization of truncated multipliers using FPGA. Int. J. Applied Sci. Eng. Tech., 5: 124-128. http:// www.waset.org/journals/waset/v57/v57-126.pdf

- Rais, M.H., 2010. Hardware implementation of truncated multipliers using Spartan 3AN, Virtex-4 and Virtex-5 devices. Am. J. Eng. Applied Sci., 3: 201-206. DOI: 10.3844/ajeassp.2010.201.206.
- Rais, M.H., B. M. Al-Harthi, S. I. Al-Askar and F. K. Al-Hussein, 2010. Design and field programmable gate array implementation of basic building blocks for power-efficient baugh-wooley multipliers. Am. J. Eng. Applied Sci., 3: 307-311. DOI: 10.3844/ajeassp.2010.307.311
- Xilinx, 2001. Virtex FPGA family datasheet. http://www.xilinx.com/support/documentation/virt ex.htm
- Xilinx, 2002. Virtex-E FPGA family datasheet. http://www.xilinx.com/support/documentation/virt ex-e\_em.htm
- Xilinx, 2007. ISE 9.2i design tool. http://www.xilinx.com/prs\_rls/2007/software/0786 \_ise92i.htm
- Xilinx, 2009. Spartan-3 FPGA family datasheet. http://www.xilinx.com/support/documentation/sap artan.htm
- Zemva, A. and M. Verderber, 2007. FPGA-oriented HW/SW implementation of the MPEG-4 video decoder. Micropro. Microsys., 31: 313-325. DOI: 10.1016/j.micpro.2006.11.001