GIGABYTE GTX 570 ReviewWhen NVIDIA introduced its Fermi architecture in its GF100 series GPUs, the NVidia 400 series, we saw some really great new technologies come into play in the GPU world. Now that NVIDIA has introduced its flagship architecture there is a growing demand for a more refined version of the GF100, and GIGABYTE has provided two GTX 570 to show off the new line of Fermi GPUs with the GF110. NVIDIA introduced the GF110 and GIGABYTE released the GIGABYTE GTX 580, which was a more refined version of the original GF100 and GTX 480/470 in many ways. Today we will take a look into the differences of GF100 and GF110 GPUs and how the GTX 570 was born to replace the NIVIDA GTX 480/470 series VGAs. We will also take a look at Scalable Link Interface (SLI) with two GIGABYTE GTX 570 on the brand new Sandy Bridge LGA1155 platform with NF200 chipset. I will then take a look into overclocking, power consumption, as well as some useful tricks for tuners and benchers alike such as how to GET 3DMark11 to work with these GPUs in SLI. The GTX 570 is part of GIGABYTE’s Hard Core Gamer Series, here are some features: GIGABYTE product number: GV-N570D5-13I-B The article will be organized into segments: Introduction (Specifications) and Packaging Closer look at the GPU and Motherboard Spacing In-depth look at the GF110 and new features The Voltage Regulator and other SMD components Overclocking and Tricks Benchmarks Single GPU/O.C. & SLI/SLI O.C. Conclusion Evolution of the GTX 570(differences between cards and specifications) I highlighted the major aspects and specifications of the top NVIDIA 400 and 500 series VGAs. I then highlighted the GTX 570 in red, and then highlighted identical specifications of the two 400 series cards. In conclusion you can gather from these specs that the GTX 570 has the same number of processors as the GTX 480 but at the same time has the same memory hardware as the GTX 470. Although it lacks the extra memory and memory bus of the GTX 480 its higher clocks will make up for it. NVIDIA did a great job with lowing the TDP so cooling this card wouldn’t be as tough nor would overclocking. Now let’s start out with some specifications, all Fermi cards GF100 and GF110 (and GF104/GTX 460 that we don’t mention) have 4 Graphic Processing Clusters (GPC). Inside each GPC are Streaming multiprocessors(SM) and a raster engine. The number streaming multiprocessors are the same in the GTX570 and GTX 480, this means that everything in that streaming multiprocessor is included as well (further discussed in the GF110 section). The GTX 580 has one more streaming multiprocessor than the GTX570. The differences between the GTX 480 and GTX 570 are that the GTX 480 has an extra 8 ROPs and 128KB L2 cache as well as the extra memory controller that the GTX 580 has. So while the GPCs are the same, the memory is laid out like it is in the GTX 580. Now where the GTX 570 is similar to the GTX 470 is in its memory sub-system, it has the same number of ROPs and so on. The core clock of the 500 series GPUs was increased over their predecessors. I will go more in depth about all the architectural improvements as well as the design of the GPU in the GF110 section. Beauty and the Beast I should mention that this GIGABYTE GTX 570 is a reference card.. What makes this GIGABYTE specific are a few very intricate differences; some to mention are the Gigabyte BIOS and packaging and accessories. Gigabyte also has other versions of the GTX 570 which are super clocked and/or have a much more beefed up cooler/VRM for overclocking. Here is a purple box, not too flashy but gets the point across, there is a card in here taller than your motherboard is wide and takes up two slots to give you that bang for your buck. For SLI GTX 570s you need two: When you open the box you realize why it’s so big, inside you have a wealth of VGA accessories. Included are: 2 x 6pin to Molex PSU power adapters, a mini HDMI to HDMI cable, a DVI to D-sub adapter, driver CD, and finally the manual. Next we move onto the GPU itself. Outside of its anti-static bag this monster is sleek, unlike the GTX 480 there are no perturbing heatpipes. GIGABYTE GTX 570 uses a vapor chamber that we will go into later in the review. Now this card’s outputs and connectors are protected by blue GIGABYTE labeled protectors. This card does come with a premium, but the packaging treats it like royalty. Connectors can be left on or removed for 2-way or 3-way SLI. Closer look at the GPU and Motherboard Spacing Installation and SLI spacing: When you install the cards you want to have them in the 16x PCI-E slots on the motherboard. Most current motherboards, but not all, support SLI technology. Here I have it installed on my P67A-UD7 with an NF200 PCI-E bridge enabling the use of two 16x slots for full speed SLI. Here we have a single card installation, please do not forget the two 6-Pin PCI-E power connectors! Now let’s get another angle in the case with the SLI connector attached as well as 2x 6pin PCI-E power cables per card, so I have to use 4x 6Pin PCI-E power cables: Above is the correct 2-way SLI installation method. Below is an indentation to maintain proper airflow for SLI configurations. This new shroud design is part of the new GIGABYTE 500 series Graphics Cards. A closer look at the card: When I said sleek, I wasn’t joking, take a look at the shot of the front end of this GPU: Before we move on, here are the dual 6-pin PCI-E power plugs: Now we disassemble: Cover off: As you can see the heatsink to cool down this 219 watt behemoth is pretty conservative, in a little we will explain how and why. Here is a shot of the fan, which is a bit different from the NVIDIA GTX 400 series blower fan in that it has a plastic stabilizer on top so that there is less vibration and thus less noise and more efficiency: Another shot of the fan: Here you can see that the heatsink is surrounded by nice foam protection so there is no direct contact to the shroud: Now we strip the card down to its bare essentials. I would like to make a few comments here: the shroud is designed to maximize airflow into the fan and then into the heatsink. The heatsink has a vapor chamber that acts as a giant heat pipe, with an evaporator wick that heats water vapor, which carries the heat up to the condenser wick. The condenser wick is attached to the top of the vapor chamber and to the bottom of the aluminum fins. The cold air from the fan that cools down the aluminum then makes its way to re-condense the vapor and it goes back to the evaporator wick to be recycled. This is a 100% vacuum sealed chamber so that the water has no problem evaporating and condensing quickly. This vapor chamber is nothing new to either GPUs or coolers, but it’s new for stock reference cards. Now we have the heatsink with vapor chamber and its 100% copper construction, the fins on the heatsink are aluminum like noted earlier. This thermal paste was not hard, and it seems to be good quality, probably ceramic or something close to it. NVIDIA and GIGABYTE realize that these cards run hotter than your CPU so they do their best to not cut corners for the cooling. Here you can see a tiny reflection, many will be tempted to lap (sand down to mirror finish) the surface, but it’s not needed and should be avoided as there is most likely a VERY thin layer of copper between the surface and the internal wick to maximize heat exchange. Earlier I skipped the full body heatsink, which cools down all the hot VRM components such as the Low RDS (on) MOSFETs, the Drivers for the MOSFETs, and the RAM. The thermal tape/heat pad used is very common in high-end IC cooling solutions; even the smallest driver has a part of a heat pad. (It was extremely hard to disassemble this I has to use a #6 star bit, and that is very small, the screws were also very tight, and they had a blue color to indicate that they were unscrewed, it will void your warranty. BUT you can replace the heatsink for the GPU without touching the full cover body heatsink.) There is one more thing I want to mention before we move even deeper into the cards electrical system, these cards are under tight quality control: Welcome to the GF110 GTX 570 Edition The greatest downfall of the GF100 series VGAs was there enormous power consumption and their large TDP. The GTX 480 which was the GF100 flagship GPU had a TDP of 250 watts. These new GF110s boast significantly lower TDPs, I will cover how they did this shortly. The basic architecture of the GF100 was carried over into the GF110, with four Graphics Processing Clusters, within each lies four Streaming Multiprocessors and one Raster Engine. Each Streaming multiprocessor contains the following: 32 CUDA Cores 2 Warp Schedulers 1 PolyMorph Engine (contains tesselator) 4 Special Function units(Geometry) 4 Texture units 16 load/store units 64KB L1 Cache and Registers Outside the GPC you have 48 ROPs (GPC to L2 cache transfer) and 768KB L2 cache which communicates and transfers data to the 6 x 64bit Memory Controllers. In traditional style we have the GTX 570(opposed to GTX580) with a single Streaming Multiprocessor disabled (greyed out) as well as a disabled memory controller along with cache and ROPS (greyed out). Instead of 16 SMs we have 15, 512 CUDA cores to 480, so on and so forth. We also have 40 ROP units opposed to 48 and 640KB L2 cache opposed to 768KB. While this does set apart the GTX 570 from the GTX 580, it doesn’t affect the GTX 570’s ability to carry out tasks nor does it limit the features of the GTX 570. Fermi architecture was designed so that instructions were to be carried out in parallel, thus why you see so much redundancy in the core. Here the impact is very minimal; you have a 16 core processor with one core turned off. A big impact comes in the reduction of memory from 1.5 GB to 1.25 GB complimented by a reduction in the memory bus from 348mb/s to 320mb/s. Transistor Addition and Rearrangement You are probably asking yourself, how can a card with the same number of cores, texture units, same manufacturing process, and transistor count have a lower thermal design power than its predecessor? The answer lies within the GF110 core itself. NVIDIA has been quoted saying: “Lower leakage transistors on less timing sensitive processing paths and higher speed [leakage] transistors on more critical processing paths” What does this mean? Well transistors are the basic element of every microprocessor, they have the ability to switch on and off and produce work and heat and the same time. Only part of the work they do can be rendered useful, so that wasted work is what we refer to as leakage. You also need to know that the ratio of work to heat is constant, so you have to deal with variable ratio of useful work to leakage. Now the way it works is that high leakage transistors are faster and better for overclocking they also tend to do more work, so how can NVIDIA keep the same design yet reduce the thermal package? The answer is rearrangement and introduction. NVIDIA used two types of transistors in their original GF100 Fermi, high leakage and low leakage, is what we will call them. For GF110 they added sort of a middle ground transistor, that had leakage in between the high and low leakage transistor, we can call it a middle leakage transistor to make it simple. Then NVIDIA rearranged their transistors so that they moved the high leakage transistors into areas where more work is being done, moved the low leakage transistors where very little work is done, and replaced some of the debatable transistors with the middle leakage ones. Now while the exact locations are unknown you can see that they really did a great job. They reduced the TDP by almost 15-20% while raising stock clocks speeds, and keeping the same architecture. Z-Culling There are some other things that were also improved upon in the GF110. One of those is known as Z-Culling. Z-Culling takes place in the Raster Engine and NVIDIA claimed that they improved its efficiency over the GF100. Now you are probably wondering what Z-Culling actually does. In laymen’s terms Z-Culling is basically throwing out pixels that the user will never see. Think about a level in your favorite game; imagine you are coming up a house in the clearing in the middle of a forest. When you approach the house while still in the forest there are parts of the house that are blocked by the trees, there is no reason to have the VGA render the pixels for those areas of the house that you can’t see in that particular frame, so z-culling is the process of removing those frames so they are not rendered and waste resources. Same thing happens as you approach the house, there are trees behind the house that have portions that should not be rendered. While this technology is nothing new, it still is very important. Power Regulation and Other SMD/ICs: While there are many options for CPU VRM design, on a GPU you only have so much area to work with. On a standard ATX motherboard there is enough room to fit 24 phase VRM, but on a GPU things are a bit different. This restriction coupled with the fact that current day GPUs pull a lot more current than current CPUs means that the VRM on a GPU needs to pack a heavy punch in a small area. Digital PWM matches perfectly in this case, as they are geared towards clearing up PCB real-estate and they offer very precise voltage control and user defined regulation. Coupled with high quality Low RDS (on) MOSFETs, Hi-C capacitors, low profile inductors, and well matched driver you can pack a lot of punch into a small area. This design is one of NVIDIAs’, and companies like GIGABYTE further refine or redesign the VRM when designing Super Clocked and overclocked versions of these reference GPUs. Today we are taking a look at a reference design of the PCB used for the GTX 580, this PCB has been stripped of a few things such as extra RAM and extra VRM components such as capacitors, and Low RDS (on) MOSFETs. On the left we have a Chil 8266 6 Phase GPU Digital PWM , on this PCB it can supply up to 6 phases, but for the GTX 570 only 4 phases are implemented. This loss of two phases might take a toll on the Sub Zero OCers. The Chil has a switching frequency range from 250khz to 1mhz, and has the ability to vary load onto just a single phase when in low power mode. On the right we have an APW7088 from Anpec Electronics two phase PWM with fixed frequency and integrated MOSFET drivers. The Anpec is used to power the two phases for the RAM, and the Chil is used for the GPU. Here we have a 4+2 phase VRM designed to generate little heat when idle and pack a nice punch when at load. In orange we have three Texas instruments INA219 current sensing power ICs, which basically are able to sense the current pulled from the PCI-E slot and the two 6-Pin PCI-E PSU connectors, these little ICs are hooked up directly to the shunt resistors to monitor power management. Now this is good news and bad news. The good news is that the voltage input can vary from the PCI-E port to the PCI-E 6 pin connectors, meaning this card can handle a dual PSU configuration if needed. But the real reason they are here is to help regulate power and help software driven overcurrent protection limit two programs, OCCT GPU and Furmark. Both programs are known to draw much more current that any game could ever load on a GPU. Companies such as NVIDIA are tired to receiving burnt out cards because of overclocking and using Furmark and OCCT to stability test. There is good news though; this limiting is only done in software, so this should have no adverse effect for Overclocking, just extra protection. In a little bit I will show you how I got around it for max TDP testing.