Multiplication In FPGA(3)
上一篇 / 下一篇 2006-05-10 08:45:17 / 天气: 晴朗 / 心情: 高兴 / 个人分类:EDA
5 bit input * 67
5 bit input * 85
000
001
010
011
100
101
110
111
000
0
536
1072
1608
0
680
1360
2040
001
67
603
1139
1675
85
765
1445
2125
010
134
670
1206
1742
170
850
1530
2210
011
201
737
1273
1809
255
935
1615
2295
100
268
804
1340
1876
340
1020
1700
2380
101
335
871
1407
1943
425
1105
1785
2465
110
402
938
1474
2010
510
1190
1870
2550
111
469
1005
1541
2077
595
1275
1955
2635
| 5 bit input * 67 | 5 bit input * 85 | |||||||
| 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 | |
|---|---|---|---|---|---|---|---|---|
| 000 | 0 | 536 | 1072 | 1608 | 0 | 680 | 1360 | 2040 |
| 001 | 67 | 603 | 1139 | 1675 | 85 | 765 | 1445 | 2125 |
| 010 | 134 | 670 | 1206 | 1742 | 170 | 850 | 1530 | 2210 |
| 011 | 201 | 737 | 1273 | 1809 | 255 | 935 | 1615 | 2295 |
| 100 | 268 | 804 | 1340 | 1876 | 340 | 1020 | 1700 | 2380 |
| 101 | 335 | 871 | 1407 | 1943 | 425 | 1105 | 1785 | 2465 |
| 110 | 402 | 938 | 1474 | 2010 | 510 | 1190 | 1870 | 2550 |
| 111 | 469 | 1005 | 1541 | 2077 | 595 | 1275 | 1955 | 2635 |
Constant Multipliers from Adders
| Adder for each '1' bit in constant | |
| Subtractor replaces strings of '1' bits using Booth recoding | |
| Efficiency, size depend on value of constant | |
| KCM multipliers are usually more efficient for arbitrary constant values |
The shift-add multiply algorithm essentially produces m 1xn partial products and sums them together with appropriate shifting. The partial products corresponding to '0' bits in the 1 bit input are zero, and therefore do not have to be included in the sum. If the number of '1' bits in a constant coefficient multiplier is small, then a constant multiplier may be realized with wired shifts and a few adders as shown in the 'times 10' example below. EDA中国门户网站l0ps+\UF&mG
| 0 0000000EDA中国门户网站R
o+Wy+R*VORM 1 1011001 EDA中国门户网站;|-UAFf|vC'Xt 0 0000000 EDA中国门户网站-I"QU Rdp-O 1 +1011001 *m%HSl,{K8g i0 1101111010 |
![]() |
In cases where there are strings of '1' bits in the constant, adders can be eliminated by using Booth recoding methods with subtractors. The 'times 14 example below illustrates this technique. Note that 14 = 8+4+2 can be expressed as 14=16-2, which reduces the number of partial products.EDA中国门户网站5Hw9dP k\n/U
| 0 0000000EDA中国门户网站o$\)ANI 1 1011001EDA中国门户网站kKN|w3z"| 1 1011001 EDA中国门户网站$tEJF g@ 1 +1011001 ^ Wfl,X4t.Dc~0 10011011110 |
0 0000000 (]H hjb{&])n0-1 1110100111 EDA中国门户网站h9x,p3^ @_pk 0 0000000 $I9a]pvr'V/k0J00 0000000 (E9XA*nq&z0?'ofc01 +1011001 EDA中国门户网站1@gO:_$lr1KU 10011011110 |
![]() |
Combinations of partial products can sometimes also be shifted and added in order to reduce the number of partials, although this may not necessarily reduce the depth of a tree. For example, the 'times 1/3' approximation (85/256=0.332) below uses less adders than would be necessary if all the partial products were summed directly. Note that the shifts are in the opposite direction to obtain the fractional partial products.
L i,s9PcT0
Clearly, the complexity of a constant multiplier constructed from adders is dependent upon the constant. For an arbitrary constant, the KCM multiplier discussed above is a better choice. For certain 'quick and dirty' scaling applications, this multiplier works nicely.
(uP!C&VI9[#]0
EDA中国门户网站R s&D#Wi{
Wallace Trees
| Optimized column adder tree | |
| Combines all partial products into 2 vectors (carry and sum) | |
| Carry and sum outputs combined using a conventional adder | |
| Delay is log(n) | |
| Wallace tree multiplier uses Wallace tree to combine 1 x n partial products | |
| Irregular routing | |
| Not optimum in many FPGAs |
A Wallace tree is an implementation of an adder tree designed for minimum propagation delay. Rather than completely adding the partial products in pairs like the ripple adder tree does, the Wallace tree sums up all the bits of the same weights in a merged tree. Usually full adders are used, so that 3 equally weighted bits are combined to produce two bits: one (the carry) with weight of n+1 and the other (the sum) with weight n. Each layer of the tree therefore reduces the number of vectors by a factor of 3:2 (Another popular scheme obtains a 4:2 reduction using a different adder style that adds little delay in an ASIC implementation). The tree has as many layers as is necessary to reduce the number of vectors to two (a carry and a sum). A conventional adder is used to combine these to obtain the final product. The structure of the tree is shown below. The red numbers after each full adder in the illustration indicate the bit weights of each signal. For a multiplier, this tree is pruned because the input partial products are shifted by varying amounts.EDA中国门户网站sB ?+k"R

+V3|!gQ&|Z7s0A section of an 8 input wallace tree. The wallace tree combines the 8 partial product inputs to two output vectors corresponding to a sum and a carry. A conventional adder is used to combine these outputs to obtain the complete product..
If you trace the bits in the tree (the tree in the illustration is color coded to help in this regard), you will find that the Wallace tree is a tree of carry-save adders arranged as shown to the left. A carry save adder consists of full adders like the more familiar ripple adders, but the carry output from each bit is brought out to form second result vector rather being than wired to the next most significant bit. The carry vector is 'saved' to be combined with the sum later, hence the carry-save moniker.EDA中国门户网站\D5AtN7u(E
A Wallace tree multiplier is one that uses a Wallace tree to combine the partial products from a field of 1x n multipliers (made of AND gates). It turns out that the number of Carry Save Adders in a Wallace tree multiplier is exactly the same as used in the carry save version of the array multiplier. The Wallace tree rearranges the wiring however, so that the partial product bits with the longest delays are wired closer to the root of the tree. This changes the delay characteristic from o(n+n) to o(n+log(n)) at no gate cost. Unfortunately the nice regular routing of the array multiplier is also replaced with a rats-nest.
#Qn4A/F/N`0A Wallace tree by itself offers no performance advantage over a ripple adder tree
EDA中国门户网站NAX4O)jP]8qTo the casual observer, it may appear the propagation delay though a ripple adder tree is the carry propagation multiplied by the number of levels or o(n*log(n)). In fact, the ripple adder tree delay is really only o(n + log(n)) because the delays through the adder's carry chains overlap. This becomes obvious if you consider that the value of a bit can only affect bits of the same or higher significance further down the tree. The worst case delay is then from the LSB input to the MSB output (and disregarding routing delays is the same no matter which path is taken). The depth of the ripple tree is log(n), which is the about same as the depth of the Wallace tree. This means is that the ripple carry adder tree's delay characteristic is similar to that of a Wallace tree followed by a ripple adder! If an adder with a faster carry tree scheme is used to sum the Wallace tree outputs, the result is faster than a ripple adder tree. The fast carry tree schemes use more gates than the equivalent ripple carry structure, so the Wallace tree normally winds up being faster than a ripple adder tree, and less logic than an adder tree constructed of fast carry tree adders. An ASIC implementation may also benefit from some 'go faster' tricks in carry save adders, so a Wallace tree combined with a fast adder can offer a significant advantage there.
9o.zsb9^}CW2[8Jr5Kqi0A Wallace tree is often slower than a ripple adder tree in an FPGA
EDA中国门户网站 f-lNN{Many FPGAs have a highly optimized ripple carry chain connection. Regular logic connections are several times slower than the optimized carry chain, making it nearly impossible to improve on the performance of the ripple carry adders for reasonable data widths (at least 16 bits). Even in FPGAs without optimized carry chains, the delays caused by the complex routing can overshadow any gains attributed to the Wallace tree structure. For this reason, Wallace trees do not provide any advantage over ripple adder trees in many FPGAs. In fact due to the irregular routing, they may actually be slower and are certainly more difficult to route.
,TM[^"gQDk0EDA中国门户网站7bz B?5? | U/iF
EDA中国门户网站)FZ_M$u]Xb
![]()
Booth Recoding
EDA中国门户网站g7~3mB[Booth recoding is a method of reducing the number of partial products to be summed. Booth observed that when strings of '1' bits occur in the multiplicand the number of partial products can be reduced by using subtraction. For example the multiplication of 89 by 15 shown below has four 1xn partial products that must be summed. This is equivalent to the subtraction shown in the right panel.
Q x.R;Ze)Bgc0| 1 1011001 $\Z2o'\)?[n01 1011001EDA中国门户网站%a5K#n3Ki0I"J 1 1011001 .DqP7_Nrh_%K)Z?01 1011001 a fyp%E2^8M/c3?00 +0000000 EDA中国门户网站ic3sbxs:Vx 10100110111 |
1 -1011001EDA中国门户网站'gl3Y.k2QbJ 1 0000000 ,xZ O l5?01 0000000 tU#J;Vvb01 0000000 p!YNd$Lt }3i00 +1011001 M'qE3{dS0 10100110111 |
标题搜索
日历
|
|||||||||
| 日 | 一 | 二 | 三 | 四 | 五 | 六 | |||
| 1 | 2 | 3 | |||||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 | |||
| 11 | 12 | 13 | 14 | 15 | 16 | 17 | |||
| 18 | 19 | 20 | 21 | 22 | 23 | 24 | |||
| 25 | 26 | 27 | 28 | 29 | 30 | 31 | |||
我的存档
数据统计
- 访问量: 3892
- 日志数: 29
- 文件数: 2
- 书签数: 1
- 建立时间: 2006-05-07
- 更新时间: 2007-05-31


