The SSD Relapse: Understanding and Choosing the Best SSD
by Anand Lal Shimpi on August 30, 2009 12:00 AM EST- Posted in
- Storage
Live Long and Prosper: The Logical Page
Computers are all about abstraction. In the early days of computing you had to write assembly code to get your hardware to do anything. Programming languages like C and C++ created a layer of abstraction between the programmer and the hardware, simplifying the development process. The key word there is simplification. You can be more efficient writing directly for the hardware, but it’s far simpler (and much more manageable) to write high level code and let a compiler optimize it.
The same principles apply within SSDs.
The smallest writable location in NAND flash is a page; that doesn’t mean that it’s the largest size a controller can choose to write. Today I’d like to introduce the concept of a logical page, an abstraction of a physical page in NAND flash.
Confused? Let’s start with a (hopefully, I'm no artist) helpful diagram:
On one side of the fence we have how the software views storage: as a long list of logical block addresses. It’s a bit more complicated than that since a traditional hard drive is faster at certain LBAs than others but to keep things simple we’ll ignore that.
On the other side we have how NAND flash stores data, in groups of cells called pages. These days a 4KB page size is common.
In reality there’s no fence that separates the two, rather a lot of logic, several busses and eventually the SSD controller. The latter determines how the LBAs map to the NAND flash pages.
The most straightforward way for the controller to write to flash is by writing in pages. In that case the logical page size would equal the physical page size.
Unfortunately, there’s a huge downside to this approach: tracking overhead. If your logical page size is 4KB then an 80GB drive will have no less than twenty million logical pages to keep track of (20,971,520 to be exact). You need a fast controller to sort through and deal with that many pages, a lot of storage to keep tables in and larger caches/buffers.
The benefit of this approach however is very high 4KB write performance. If the majority of your writes are 4KB in size, this approach will yield the best performance.
If you don’t have the expertise, time or support structure to make a big honkin controller that can handle page level mapping, you go to a larger logical page size. One such example would involve making your logical page equal to an erase block (128 x 4KB pages). This significantly reduces the number of pages you need to track and optimize around; instead of 20.9 million entries, you now have approximately 163 thousand. All of your controller’s internal structures shrink in size and you don’t need as powerful of a microprocessor inside the controller.
The benefit of this approach is very high large file sequential write performance. If you’re streaming large chunks of data, having big logical pages will be optimal. You’ll find that most flash controllers that come from the digital camera space are optimized for this sort of access pattern where you’re writing 2MB - 12MB images all the time.
Unfortunately, the sequential write performance comes at the expense of poor small file write speed. Remember that writing to MLC NAND flash already takes 3x as long as reading, but writing small files when your controller needs large ones worsens the penalty. If you want to write an 8KB file, the controller will need to write 512KB (in this case) of data since that’s the smallest size it knows to write. Write amplification goes up considerably.
Remember the first OCZ Vertex drive based on the Indilinx Barefoot controller? Its logical page size was equal to a 512KB block. OCZ asked for a firmware that enabled page level mapping and Indilinx responded. The result was much improved 4KB write performance:
Iometer 4KB Random Writes, IOqueue=1, 8GB sector space | Logical Block Size = 128 pages | Logical Block Size = 1 Page |
Pre-Release OCZ Vertex | 0.08 MB/s | 8.2 MB/s |
295 Comments
View All Comments
sotoa - Friday, September 4, 2009 - link
Another great article. You making me drool over these SSD's!I can't wait till Win7 comes to my door so I can finally get an SSD for my laptop.
Hopefully prices will drop some more by then and Trim firmware will be available.
lordmetroid - Thursday, September 3, 2009 - link
I use them both because they are damn good and explanatory suffixes. It is 2009, soon 2010 I think we can at least get the suffixes correct, if someone doesn't know what they mean, wikipedia has answers.AnnonymousCoward - Saturday, September 5, 2009 - link
As someone who's particular about using SI and being correct, I think it's better to stick to GB for the sake of simplicity and consistency. The tiny inaccuracy is almost always irrelevant, and as long as all storage products advertise in GB, it wouldn't make sense to speak in terms of GiB.Touche - Thursday, September 3, 2009 - link
Both articles emphasize Intel's performance lead, but, looking at real world tests, the difference between it and Vertex is really small. Not hardly enough to justify the price difference. I feel like the articles are giving an impression that Intel is in a league of its own when in fact it's only marginally faster.smjohns - Tuesday, September 8, 2009 - link
This is where I struggle. It is all very well quoting lots of stats about all these drives but what I really want to know is if I went for Intel over the OCZ Vertex (non-turbo) where would I really notice the difference in performance on a laptop?Would it be slower start up / shut down?
Slower application response times?
Speed at opening large zipped files?
Copying / processing large video files?
If the difference is that slim then I guess it is down to just a personal preference....
morrie - Thursday, September 3, 2009 - link
I've made it a habit of securely deleting files by using "shred" like this: shred -fuvz, and accepting the default number of passes, 25. Looks like this security practice is now out, as the "wear" on the drive would be at least 25x faster, bringing the stated life cycles closer to having an impact on drive longevity. So what's the alternative solution for securely deleting a file? Got to "delete" and forget about security? Or "shred" with a lower number of passes, say 7 or 10, and be sure to purchase a non-Intel drive with the ten year warranty and hope that the company is still in business, and in the hard drive business, should you need warranty service in the outer years...Rasterman - Wednesday, September 16, 2009 - link
watching too much CSI, there is an article somewhere i read by a data repair tech who works in one of the multi-million dollar data recovery labs, basically he said writing over it once is all you should do and even that is overkill 99% of the time. theoretically it is possible to even recover that _sometimes_, but the expense required is so high that unless you are committing a billion dollar fraud or are the secretary to osama bin laden no one will ever try to recover such data. chances are if you are in such circles you can afford a new drive 25x more often. and if you have such information or knowledge wouldn't be far easier and cheaper to simply beat it out of you than trying to recover a deleted drive?iamezza - Friday, September 4, 2009 - link
1 pass should be sufficient for most purposes. Unless you happen to be working on some _extremely_ sensitive/important data.derkurt - Thursday, September 3, 2009 - link
I may be wrong on this, but I'd assume that once TRIM is enabled, a file is securely deleted if it has been deleted on the filesystem level. However, it might depend on the firmware when exactly the drive is going to actually delete the flash blocks which are marked as deletable by TRIM. For performance reasons the drive should do that as soon as possible after a TRIM command, but also preferably at a time when there is not much "action" going on - after all, the whole point of TRIM is to change the time of block erasing flash cells to a point where the drive is idle.
morrie - Thursday, September 3, 2009 - link
That's on a Linux system btwAs to aligning drives...how about an update to the article on what needs to be done/ensured, if anything, for using the drives with a Linux OS?