Xilinx FPGA's power optimization design

For FPGA, designers can make full use of its programmability and related tools to accurately estimate power consumption, and then optimize the technology to make the FPGA design and the corresponding PCB board more efficient in terms of power.


Static and dynamic power consumption and its changes


In the 90nm process, the current leakage problem becomes quite serious for both ASIC and FPGA. In the 65nm process, this problem is even more challenging. In order to obtain higher transistor performance, the threshold voltage must be lowered, but at the same time the current leakage is increased. Xilinx has made many efforts to reduce current leakage. Nevertheless, the static power consumption due to leakage still varies 2:1 under the worst and typical process conditions. Leakage power consumption is greatly affected by the core voltage (VCCINT), which is approximately proportional to its cube. Even if VCCINT only rises by 5%, static power consumption will increase by about 15%. Finally, the leakage current is also closely related to the junction (or chip) temperature.


The other source of static power consumption in FPGAs is the DC current of the working circuit, but to a large extent, this part of the current does not change much with the process and temperature. For example, I/O power supply (such as HSTL, SSTL and LVDS and other I/O standard termination voltage) and LVDS and other current-driven I/O DC current. Some FPGA simulation modules also bring static power consumption, but also have little to do with process and temperature. For example, the digital clock manager (DCM) used to control the clock in Xilinx FPGA; the phase-locked loop (PLL) in Xilinx Virtex-5 FPGA; and the unit IODELAY used for programmable delay of input and output information in Xilinx FPGA.


Dynamic power consumption refers to the power consumption caused by the switching activity of the FPGA core or I/O. In order to calculate dynamic power consumption, the number of switching transistors and wiring, capacitance, and switching frequency must be known. In FPGAs, transistors implement logic and programmable interconnections between metal wires. The capacitance includes the parasitic capacitance of the transistor and the capacitance of the metal interconnection line.


The formula of dynamic power: PDYNAMIC=nCV2f, where n=number of switching nodes, C=capacitance, V=voltage swing, f=switching frequency.


A more compact logic package (through internal FPGA architecture changes) can reduce the number of switching transistors. The use of smaller-sized transistors can shorten the wiring length between the transistors, thereby reducing dynamic power. Therefore, the 65nm transistor in the Virtex-5 FPGA has a smaller gate capacitance and a shorter interconnect length. The combination of the two can reduce the capacitance of the node by about 15% to 20%, which can further reduce the dynamic power.


Voltage also has an effect on dynamic power. From 90nm to 65nm process, just by reducing VCCINT from 1.2V to 1V, the dynamic power of Virtex-5 FPGA design is reduced by about 30%. Coupled with the power reduction brought by the structural enhancement, the total dynamic power consumption is reduced by 40% to 50% compared with the 90nm technology.


(Note: Dynamic power is proportional to the square of VCCINT, but for the FPGA core, it is basically independent of temperature and process.)


Use FPGA design technology to reduce power consumption


Xilinx provides two power analysis tools. The XPower Estimator (XPE) spreadsheet tool can be used before the designer uses the physical implementation tool. After the physical implementation of the design is completed, the second tool XPower Analyzer can be used to check the impact of the changes made on power consumption.


One way to reduce power consumption is to select the most suitable FPGA for the design, and then use its programmability to further optimize the power consumption of the design. The right design choice will improve both static and dynamic power consumption.


The static power from the leakage current is proportional to the number of logic resources, that is, proportional to the number of transistors used to construct a particular FPGA. Therefore, if the FPGA resources used are reduced and the design is implemented with smaller devices, then static power consumption can be reduced.


Many methods can be used to reduce the scale of the design. The most basic technique is the time-sharing of logic functions. In other words, if two sets of circuits complete a set of linear functions and are completely the same as each other, then only one set of circuits can be used but the speed is doubled to complete the same function. In this way, the required logic resources are reduced by half.


Another way to reduce the logic scale is to use the partial reconfiguration function of Xilinx FPGAs. When the two parts of the circuit are not working at the same time, a certain part of the circuit can be reconfigured to achieve another circuit function in a certain period of time.


At the same time, functions can also be moved to less restricted resources, for example, the state machine is transferred to BRAM, or the counter is transferred to the DSP48 module, the register is transferred to the shift register logic, and the BRAM is transferred to the look-up table RAM (LUTRAM). ). At the same time, it can be ensured that the timing of the design is not too tight, because that will require more logic and registers.


In addition, the advantages of the hard IP blocks (BRAM, DSP, FIFO, Ethernet MAC, PCI Express) integrated in the FPGA architecture should also be fully utilized.


Another way to reduce static power is to carefully review the design to avoid redundant DC consumption sources. Modules with redundant or hidden DCM or PLL are often used in the design. In this case, redundant resources may be forgotten after the module design, or some legacy codes may be used when building next-generation products. Abstract DCM or PLL to the top level of the design, so that resources can be shared between modules, which can further reduce the scale of the design and reduce DC power.


Better use of memory modules can also help reduce the dynamic power consumption of FPGA designs, thereby further reducing overall power consumption. Since dynamic power consumption is a function of capacitive reactance (area or length) and frequency, you should examine the way the block memory is accessed in the design and determine the area where capacitive reactance and frequency can be optimized.


Xilinx FPGAs provide two types of memory arrays. 18Kbit or 36Kbit BRAM is optimized for large memory modules. LUTRAM is based on a lookup table in FPGA and is optimized for fine-grained storage. In Xilinx Virtex-5 FPGA, the unit of LUTRAM is 64bit.


Of these two types, BRAM usually consumes more power. The static power of the BRAM after being enabled is the largest part of its power consumption, and the power consumption caused by the jump is in the second place. Designers can take some steps to optimize the power consumption of BRAM. For example, BRAM can be enabled only during read or write cycles. For smaller memory modules, LUTRAM can be used instead of BRAM, leaving BRAM for larger memory modules. In addition, you can also try to use BRAM for multiple large modules. Another technique is to rationally arrange the memory array to reduce the delay area it occupies, maximize performance and minimize its power consumption. The left side of Figure 1 shows a 2K x 36bit storage array optimized for speed and area.


We use four 2K x 9bit modules to form this storage array in parallel, and enable all four modules when new values are needed. Another method is to use four 512 x 36bit modules to arrange 2K x 36bit, but use the lower two address decoding to select which 512 x 36bit module to access. In the latter case, only one memory block is accessed at a certain time, and the power consumption will be reduced by 75% compared with the first method.


The right side of Figure 1 shows Xilinx's Block Memory Generator, which can generate a memory array of any size and optimize it for speed or power. Figure 2 shows the Xilinx Power Estimator in a specific application and compares the power consumption when N modules are started at the same time and when N/4 modules are started at a given enable rate. The results showed that the dynamic power was reduced by 75%.






Figure 1 Speed and area and power optimized memory array (left)


And Xilinx Block Memory Generator and power area selection (right)


Xilinx tools can help select the appropriate memory array. Consider the need for two sets of memory areas in a design. In one case, 16 sets of 64 x 32bit memory structures running at 300MHz (total number of bits is 32K) are required, and in the other case 16 sets of 512 x 36bit memory structures (total number of bits is 294K) are required.


Looking at the power comparison of 16 groups of 64 x 32bit memory structures, the XPE tool shows that small memory arrays are best implemented with LUTRAM, which saves 85% of power consumption compared with BRAM (Figure 3). This is because if BRAM is used, only 16 18K-bit modules can be used to implement 16 extremely small (64 x 32bit) memories, and a lot of space is wasted. In the second case, when comparing the power of 16 groups of 18K bit arrays, XPE shows that the situation is just the opposite, and a larger memory array should be used to implement it (Figure 4). In this case, using BRAM can save 28% of power consumption than using LUTRAM. This is because if LUTRAM is used, more small-grained objects need to be enabled and more interconnections are added.


Clock gating function of Xilinx FPGA


The clock gating function of Xilinx FPGA provides some very interesting uses. For example, you can use the BUFGMUX clock buffer to turn off a certain global clock in the FPGA, or dynamically select a slower clock. You can also use the BUFGCE clock buffer to perform cycle-by-cycle gating, similar to the clock gating technology used in ASIC design. These two functions can be used simultaneously in the design.


In some designs, some modules are not always used, but have a great impact on power consumption. These methods are very useful at this time. A large clock domain that may have thousands of loads can be turned on or off on a clock cycle basis or a combination of multiple clock cycles.






Figure 2 XPE power optimization array results






Figure 3 Use block RAM or LUTRAM to realize power estimation of small memory array






Figure 4 Utilize LUTRAM and block RAM to realize the power estimation of large memory array


Reduce power consumption at the board level


PCB designers, mechanical engineers, and system architects can consider several aspects to reduce FPGA power consumption at the circuit board level. FPGA core voltage and junction temperature have a strong influence on different aspects of power consumption.


Controlling the VCCINT core voltage is a way to reduce power consumption at the board level. Both static power consumption and dynamic power consumption due to leakage are highly dependent on the core voltage of the FPGA. Therefore, one way to reduce leakage is to set the core voltage close to the rated value (1V) instead of working at the high end of the Virtex-5 voltage range (1.05V = +5%).


With modern switching regulators, a voltage stability of ±1.5% can be obtained instead of the standard ±5% specification. Keeping the core voltage at 1V (instead of the maximum value of 1.05V) can reduce the static power consumption caused by leakage by 15% and the dynamic power consumption by 10%.


A simple and obvious way to reduce FPGA junction temperature is to use a PCB or heat sink with better heat dissipation. Then, FPGA designers are encouraged to change as long as they can reduce power consumption. When the junction temperature is about 100°C, a temperature drop of 15°C can reduce the static power consumption caused by leakage by 20%.


Power consumption can also be reduced by monitoring the temperature and voltage in the FPGA. Virtex-5 FPGA contains an analog module called System Monitor, which can monitor external and internal analog voltages and internal temperature of the chip. System Monitor is based on a 10-bit A/D converter, which can provide accurate and reliable measurement results from -40°C to +125°C. The A/D converter digitizes the output of the on-chip sensor, which can be used to monitor up to 17 external analog inputs to monitor system performance and the external environment. The module includes configurable thresholds and alarm levels, and can store measurement results in configurable registers, so it can be easily interfaced to user logic or microprocessors.


In addition, I/O power becomes another important factor that needs to be considered in the process of balancing power consumption and performance, and overall power consumption can be further reduced through more optimized I/O selection. For output, the standard with the largest driving force consumes the largest power, so the power varies linearly with the output enable rate and hop rate. However, LVDS is an exception because it uses a fixed current source that is independent of the hopping rate. For the input, the reference standards consume more power because they need to implement a differential receiver and require optional internal termination. Both need to consume DC power.


Since termination usually consumes a lot of power, the balance of power and performance should be carefully considered when using it. The use of external interfaces or solutions that do not require termination will greatly reduce power consumption.


Summarize


Xilinx has been committed to integrating power optimization technology in the ISE suite tools. At the same time, ISE can also be configured as a power optimization synthesis engine to automatically locate small arrays in the source code and integrate them into LUTRAM.


Recently, Xilinx also introduced an optimized placer that can group functions to minimize wiring distance and capacitive reactance. A set of related tools called PlanAhead can group logical resources and physically perform rough area estimation and location positioning within the FPGA, which can reduce capacitance and speed up routing.


Xilinx expects that FPGA dynamic and static power will continue to face challenges, so it will continue to work on optimizing FPGA power management tools and design methods, and will continue to work hard to solve power consumption issues at the chip level.


cache
Processed in 0.005472 Second.