Memory Performance: 16GB DDR3-1333 to DDR3-2400 on Ivy Bridge IGP with G.Skillby Ian Cutress on October 18, 2012 12:00 PM EST
- Posted in
- Ivy Bridge
Memory reviews are, in my opinion, actually quite hard to do. There are plenty of memory kits available that are nice and cheap, and the easy way to differentiate between them in a review is usually though synthetics – without too much effort we can find memory comparison articles online that deal solely in synthetics. The downside of synthetics is that they rarely emulate real-world performance. When the requests came in for a comparison of memory kits available on the market, I was stumped to find real-world examples where memory truly matters by significant margins, and benchmarks to match. Fast forward a month or so, and we have compiled a series of tests taking advantage of some of the most memory limited examples common to most users – IGP performance using memory from DDR3-1333 to DDR3-2400. Inside this review we have also mixed in some encoding, compression, and you may be surprised to hear that USB 3.0 performance is also affected by memory speed. In this article we also look at and review the memory kits that G.Skill has gracefully provided from their Ares, Sniper, RipjawsX, RipjawsZ and TridentX brands.
Memory in a Nutshell
Graphical performance is all about vector calculations - moving data from memory to the compute units for calculation then placing it back out again where required. High end graphics cards do this quite well, with the high end NVIDIA GTX680 video cards achieving a rated bandwidth of ~192 GB/s. In comparison, integrated graphics have a tough time. Their main memory store is the system memory, which can vary from 10 GB/s to 50 GB/s depending on the platform. There are architectural decisions made in both circumstances (discrete and IGP) to reduce the importance of memory bandwidth, and software can be written to hide the memory bandwidth or memory latency issues. But the fact still remains that memory bandwidth is key and vital for a good number of real-world applications and usage scenarios.
The future of memory is a little mysterious to say the least. Current modern systems run DDR3 SDRAM that can vary in speed from 800 MHz to 3000 MHz, which also varies in price, performance, power usage, and if the memory controller can handle such a speed. Those 3000 MHz modules cost a pretty penny, and are reputed to only work with 1 in 10 Ivy Bridge processors. The immediate future for memory still lies in DDR3 – the next iteration, DDR4, is still several years away. We are told that on the Intel side of things, Haswell is DDR3, as will be Broadwell, the Haswell replacement. Reports expect DDR4 to be less than 10% of the market in late 2014 (early adoption in the high end space), but 50%+ across 2015. DDR4 is expected to have a base speed of 2133 MHz up to 3200 MHz for initial enthusiast applications – though given the rise in enthusiast speeds this could seemingly be pushed to 4266 MHz+ over the course of the development cycle. DDR4 is also expected to be a single module per channel, paving the way for up-to-quad channel in the mainstream arena.
There are also exciting technologies being developed in the memory space, for both NAND and DRAM – memristors, ReRAM, stacked memory, spintronics et al. If history is anything to go by, as long as these technologies are not hindered by patents, trolls or physics, each could lead to interesting products coming to market. Though we may have to wait several years, and chances are that only one or two will come through for their respective markets, and the rest will go the way of Betamax and HD-DVD.
Back to our DDR3 memory, G.Skill was kind enough to provide us several kits for this overview of memory performance. Most DDR3 kits on sale for the vast majority of users come in speeds from 1333 MHz to 2133 MHz. Anything above DDR3-2133 is definitely in the enthusiast range, and as such G.Skill also sent us a DDR3-2400 kit to test for this overview. In due course we also have a DDR3-2666 kit to test, so stay tuned for that review.
Not All About The MHz
But memory is not all about the MHz, just as computer speed is not all about the MHz and cores. Deciding when memory should be accessed, what delays to be put in place between read and write cycles are the sub-timings. These sub-timings are arguably more important than the MHz number, as we will see in the review. The main timings on display to the public are the following:
CAS Latency (CL)
RAS to CAS (tRCD)
RAS Precharge (tRP)
Row Active Time (tRAS)
Row Cycle Time (tRC)
Command Rate (CR)
For a very extensive look into memory, our last big memory article went into obscene depth of how memory works. Please read it here, and I will confess that I do not understand it after just reading it, but need a pen and paper when going through it thoroughly. One of the most important images of that memory article is the following:
Shown here are pair of "back-to-back" reads. Our example Row Cycle Time (tRC) lets us transfer up to 16 bytes of data with a minimum Page open time of 24T using CL-tRCD-tRP-tRAS timings of 6-6-6-18
Using this image, from left to write, we can explain what the timings mean to a certain degree:
tRAS determines the length of time between initialization and the requirement for the memory row to recharge. Within this tRAS we need a tRCD to initialize the column of the row from which we would like to read. After the tRCD is the CL, which provides a read latency. There are also other features which allow for reads across multiple columns within the tRAS, however in order to move to the next row the tRAS needs to end and the tRP allows the next row to precharge.
All this means that:
If tRAS is a low number, it is quick to read from different rows. If it is a high number, reading from different columns is easier.
If CL is a low number, reading from within a row (and the columns) is quicker.
If tRCD is low, more CLs can be initialized inside the tRAS.
If tRP is low, then the overall time (tRAS+tRP) to jump between row reads is quicker.
When we buy a memory kit, we usually get a SKU number and a description of the modules at hand. Let us look at the first kit we will be testing today:
4x4 GB DDR3-1333 9-9-9-24 1.50V
The first line describes the module in the form of a SKU, which allows for stock checking. In this case, G.Skill’s naming scheme makes it simple – F3 means DDR3; 1333C9 means 1333 MHz with CL9; Q means quad module kit; 16G means it is a 16GB kit; A means the Ares branding; and O means our kit is colored orange.
The second line is a little more readable. First we get the size of the kit (4x4 GB) then the speed (DDR3-1333). Next are the sub-timings, which will always appear in the order of CL-tRCD-tRP-tRAS. This means that our three main sub-timings are 9-9-9, and the tRAS is 24. The last bit of information is the voltage of the kit.
So What Can I Do About Sub-Timings?
As a general rule, lower is better. Memory kits on the market will vary in their subtimings – you can purchase DDR3-1333 9-9-9, DDR3-1600 11-11-11 all the way up to DDR3-3000 12-14-14. The question then becomes whether you want to decide between two similar kits. Imagine the following kits:
In a lot of scenarios, an enthusiast may take one look at these numbers and tell a user that these kits are equivalent – boosting the memory speed but increasing the sub-timing latencies causes similar performance. There is one way to determine whether a kit might be better than another, and that is to look at the calculable latency of the kit.
Calculating this value is a simple enough formula:
2000 x (CL / Speed ) = Latency in nanoseconds (ns)
Thus for the three kits above:
DDR3-1333 9-9-9 has a latency of 13.5 ns
DDR3-1600 10-10-10 has a latency of 12.5 ns
DDR3-1866 11-11-11 has a latency of 11.79 ns
This latency essentially tells us which kit is fastest at non-sequential reads. Non-sequential reads are important in a lot of variable scenarios, such as video games whereby the user could perform one of a billion different actions and as such different elements of the memory have to be loaded.
The downside of this test is that it does not take into account consecutive reads. When dealing with conversion, video editing, or anything that requires a large dataset to be read sequentially, we have to look at how long reads are processed.
The way to check this with DDR3 is as follows:
Cycle time in ns = 1000 / (Memory Speed / 2)
Bit time in ns = 1000 / Memory Speed
The time to read a single word of data (word is a technical term meaning 64 bits) is given by the Cycle Time multiplied by the CL. The time to read eight words is the Cycle Time multiplied by the CL then add seven lots of Bit Time. Let us go through the memory kits above with this method.
DDR3-1333 9-9-9 has a Cycle Time of 1.5 ns and a Bit Time of 0.75 ns
The time to read one word is 1.5*9 = 13.5 ns
The time to read eight words is 13.5 + 7 * 0.75 = 18.75 ns
DDR3-1600 10-10-10 has a Cycle Time of 1.25 ns and a Bit Time of 0.625 ns
The time to read one word is 1.25 * 10 = 12.5 ns
The time to read eight words is 12.5 + 7 * 0.625 = 16.875 ns
DDR3-1866 11-11-11 has a Cycle Time of 1.07 ns and a Bit Time of 0.536 ns
The time to read one word is 1.08 * 11 = 11.79 ns
The time to read eight words is 11.79 + 7 * 0.536 = 15.54 ns
In both the sort reads and long reads, DDR3-1866 11-11-11 wins out of the three kits. But what if it was not so clear cut?
The following kits have the following timings and results:
DDR3-2000 at 9-9-9 reads one word in 9 ns and eight words in 12.5 ns
DDR3-1666 at 7-7-7 reads one word in 8.75 ns and eight words in 13.125 ns
This means that the DDR3-2000 kit should be better for longer reading workloads, whereas the DDR3-1666 kit should be better for random reads.
I should stress (and add a disclaimer) that this comparison is all at the high level, as we are only talking about memory speed and CAS Latency – everything else plays its part, and I highly suggest reading Rajinder’s memory article to get a deeper look as to how this all works.
Personally, I use these formulas when overclocking competitively – if I have two kits, one of which can do DDR3-2000 6-7-7 and the other is DDR3-2666 11-13-13, I can decide which one is more appropriate for the benchmark in question.
This funny little number at the end is often quoted as 1T or 2T depending on the memory kit, how many modules are installed, and the motherboard settings. The command rate is the address and command decode latency, essentially the delay between accessing and decoding data - this delay allows the memory time to be accessed without errors.
In an ideal world there should be no latency, but in performance tests using a setting of 1T is shown to be quicker than 2T for synthetic benchmarks. Whether a user can feel the difference (in essence it adjusts peak bandwidth as well) is debatable, but the slower the kit as standard, the more of a difference will be felt between the two options. The argument also exists that a setting of 2T will allow the kit to be overclocked higher.
By default 2T is usually selected for memory kits that contain more modules - on the off chance that one module of the kit cannot perform at the stated speed using 1T timings, defaulting to 2T will make sure more modules pass the binning process.
Standards and The Issue With Memory
Contrary to the most popular of beliefs, memory kits do not work as stated out of the box. The number of times I have walked through a large LAN event and found people playing games on $2000+ water cooled systems, only to find that their kit of DDR3-2400 is actually running at DDR3-1333 astounds me. It is a lot more common than you think, and there is probably someone you know that is a culprit of this. Making sure memory is set at its rated speed is an important part of the process, and as an enthusiast we have a job to make sure that is the case.
Rant aside, this is an important point – when we buy a processor, it always runs at the stated speed. When we plug it into the system, there is no fiddling required. If every time I installed a processor I had to go into the BIOS and adjust it so it runs above 1.2 GHz or 1.6 GHz, I would be annoyed. So why is there this discontinuity on the memory side? Why do we have to go into the BIOS to adjust the memory speed to what it says on the box?
The issue is largely down to compatibility. When a processor is installed into the board, the processor knows that it will go into a board that has the right socket, it knows that there will be pins for a certain number of PCIe lanes or for data transfer to the chipset. It also knows that there will be memory on the end of some pins that runs at a designated multiplier as dictated by the BIOS. The issue with memory is that the memory does not know where it will be plugged into.
A DDR3 module or kit could be plugged into any DDR3 compatible motherboard, and paired with AMD, Intel, or any other processor capable of DDR3, such as server parts. As processor design is now putting the memory controller onto the CPU itself, the capabilities of that memory controller can vary wildly. On a Xeon processor, the system may only accept 1600 MHz maximum due to the capable multipliers, so it would be foolish to try and boot the system with a 2133 MHz kit attempting to apply full speed. We could plug at DDR3-2666 kit into a Sandy Bridge system, but the memory controller would refuse to run at 2666 MHz. However, take the same motherboard and an Ivy Bridge processor, and the memory should be able to work. Then at the high end, remember I mentioned that there are DDR3-3000 memory kits that only work with 10% of Ivy Bridge i7-3770K processors? There is that too. I could plug in a four module DDR3-kit into a 990FX board, a P67 motherboard, a B75 motherboard, or something nice and obscure. The memory does not know what processor or memory controller it is going to get, but the processor does know that it will get DDR3 when it is plugged in. There are a lot more variables on the memory side which are unpredictable.
With that being said, we have seen some Kingston memory with plug-and-play capabilities. This memory was limited in speed, availability, and did not catch on in the way that it should. Speaking with memory vendors, the main barrier to this being applied globally are the motherboards themselves – the motherboard should be able to recognize a plug-and-play kit then adjust accordingly. There are already standards set in place (JEDEC, XMP – more later on these), so if the plug-and-play does not work, then the speed will be reduced down to the one that works. It sounds simple, but then again how do we confirm that the memory works? If it boots into an operating system, or if it survives 72 hours of MemTest86 or Linpack? Do people want to wait 3 days to get the system at the speed the kit is rated? The answer is almost certainly no, hence why we are limited to adjusting a BIOS setting to get the speed we want.
I have floated the idea of having software with the memory kit to enable XMP through the operating system, but the main barrier to that is the need for the software to work with every motherboard available. The next thought was to whether the motherboard manufacturers could create the software, to enable a JEDEC or XMP setting on the next boot through software. As expected, the answer was the complication of so many modules and so many motherboards. The answer to this new problem would be to include standards to the memory and the motherboards so this all works – but there are already standards. For this to work, it would require a deep partnership between a motherboard manufacturer and a memory vendor, potentially aiding sales from both sides. We will see.
In the meantime, make sure your friends and family are running their memory at rated speed!
Enough! Where Is All The Memory?
This review takes into account five kits from DDR3-1333 to DDR3-2400. Many thanks to G.Skill for providing us with these memory kits, one of each from their Ares, RipjawsX, Sniper, RipjawsZ and TridentX series. Specifically, we have the following kits:
4 x 4 GB DDR3-1333 9-9-9-24 1.50 V : F3-1333C9Q-16GAO (Ares)
4 x 4 GB DDR3-1600 9-9-9-24 1.50 V : F3-12800CL9Q-16GBXL (RipjawsX)
4 x 4 GB DDR3-1866 9-10-9-28 1.50 V : F3-14900CL9Q-16GBSR (Sniper)
4 x 4 GB DDR3-2133 9-11-10-28 1.65 V : F3-17000CL9Q-16GBZH (RipjawsZ)
4 x 4 GB DDR3-2400 10-12-12-31 1.65 V : F3-2400C10Q-16GTX (TridentX)
Over the next few pages, we take the run down of all these kits.
Post Your CommentPlease log in or sign up to comment.
View All Comments
frozentundra123456 - Thursday, October 18, 2012 - linkWhile interesting from a theoretical standpoint. I would have been more interested in a comparison in laptops using HD4000 vs A10 to see if one is more dependent on fast memory than others. To be blunt, I dont really care much about the IGP on a 3770K. It would have been a more interesting comparison in laptops where the igp might actually be used for gaming. I guess maybe it would have been more difficult to do with changing memory around so much in a laptop though.
The other thing is I would have liked to see the difference in games at playable frame rates. Does it really matter if you get 5.5 or 5.9 fps? It is a slideshow anyway. My interest is if using higher speed memory could have moved a game from unplayable to playable at a particular setting or allowed moving up to higher settings in a game that was playable.
mmonnin03 - Thursday, October 18, 2012 - linkRAM by definition is Random Access which means no matter where the data is on the module the access time is the same. It doesn't matter if two bytes are on the same row or on a different bank or on a different chip on the module, the access time is the same. There is no sequential or random difference with RAM. The only difference between the different rated sticks are short/long reads, not random or sequential and any reference to random/sequential reads should be removed.
Olaf van der Spek - Thursday, October 18, 2012 - linkYou're joking right? :p
mmonnin03 - Thursday, October 18, 2012 - linkWell if the next commenter below says their memory knowledge went up by 10x they probably believe RAM reads are different depending on whether they are random or sequential.
nafhan - Thursday, October 18, 2012 - link"Random access" means that data can be accessed randomly as opposed to just sequentially. That's it. The term is a relic of an era where sequential storage was the norm.
Hard drives and CD's are both random access devices, and they are both much faster on sequential reads. An example of sequential storage would be a tape backup drive.
mmonnin03 - Thursday, October 18, 2012 - linkRAM is direct access, no sequential or randomness about it. Access time is the same anywhere on the module.
XX reads the same as
Where X is a piece of data and they are laid out in columns/rows.
Both are separate commands and incure the same latencies.
extide - Thursday, October 18, 2012 - linkNo, you are wrong. Period. nafhan's post is correct.
menting - Thursday, October 18, 2012 - linkno, mmonnin03 is more correct.
DRAM has the same latency (relatively speaking.. it's faster by a little for the bits closer to the address decoder) for anywhere in the memory, as defined by the tAA spec for reads. For writes it's not as easy to determine since it's internal, but can be guessed from the tRC spec.
The only time that DRAM reads can be faster for consecutive reads, and considered "sequential" is if you open a row, and continue to read all the columns in that row before precharging, because the command would be Activate, Read, Read, Read .... Read, Precharge, whereas a "random access" will most likely be Activate, Read, Precharge most of the time.
The article is misleading, using "sequential reads" in the article. There is really no "sequential", because depending if you are sequential in row, column, or bank, you get totally different results.
jwilliams4200 - Thursday, October 18, 2012 - linkI say mmonnin03 is precisely wrong when he claims that " no matter where the data is on the module the access time is the same".
The read latency can vary by about a factor of 3 times whether the read is from an already open row, or whether the desired read comes from a different row than one already open.
That makes a big difference in total read time, especially if you are reading all the bytes in a page.
menting - Friday, October 19, 2012 - linkno. he is correct.
if every read has the conditions set up equally (ie the parameters are the same, only the address is not), then the access time is the same.
so if address A is from a row that is already open, the time to read that address is the same as address B, if B from a row that is already open
you cannot have a valid comparison if you don't keep the conditions the same between 2 addresses. It's almost like saying the latency is different between 2 reads because they were measured at different PVT corners.