dr strangedrive redux: should you swap on an SSD?

In my recent post about SSDs I did make one major omission, which a friend pointed out on Twitter afterward:

Indeed, I don’t run a swap partition in my desktop PC — RAM is cheap, so I have 12GB of it, and if you’re debating the cost of an SSD, you can probably afford 8-12GB of RAM, too. Let’s play devil’s advocate, though, and say that you can’t upgrade your RAM for whatever reason. Conventional wisdom says that swapping on an SSD is a sure-fire way to send it to an early grave, but is that really the case?

Individual flash cells do have a finite limit on the number of times they can be erased, so it makes sense that if one part of your SSD (say, your swap partition) sees a lot more writes than other areas that it would wear out more quickly. That doesn’t actually happen on a modern SSD, though — they use wear leveling to spread writes as evenly as possible across all available flash. Even if you overwrite a single disk block repeatedly, the SSD’s controller will keep moving that block to different flash cells, transparently remapping things to hide the details from the OS.

Swapping on an SSD, then, should cause no more stress than any other write activity, so it should be perfectly safe, as long as those extra writes don’t push the SSD beyond what it can handle. This calls for another test!

The test

I forced my PC to use swap in a civilised manner, without resorting to pulling out sticks of RAM

I forced my PC to use swap in a civilised manner, without resorting to pulling out sticks of RAM

As in my last post, I observed my write traffic across a typical work day, but with one difference: I removed 8GB of RAM (by rebooting and adding “mem=5G” to my kernel command line, which left me with just over 4GB of RAM once various bits of hardware address space had been accounted for) and replaced it with a swap partition.

The write activity was much more spiky — there are several times when substantial amounts of data are written to swap — and it’s higher on average, too, but it’s clear from the graph that there’s still nothing to worry about. Across the day, about 2.7GB of data was written to swap, and the total data written was 13GB, well below the 5-year lifespan threshold of 40GB/day that I established in my last post.

made with ChartBoot

In fact, if you’re stuck with a PC with limited RAM, I’d heartily recommend swapping on an SSD! It’s so fast that you never really notice that you’re swapping, especially without the sound of a busy hard drive to remind you. In fact, I barely noticed that two-thirds of my RAM was missing.

Swap tuning

With some tuning, you may in fact find yourself using less swap on an SSD than you would on a hard drive. If you’ve been using Linux for a while, you’re probably learned (perhaps after making a semi-panicked “what’s using all my RAM?” post on a Linux forum) that Linux will use all of your free RAM as disk cache to improve performance. However, Linux goes further than that: it’ll sometimes push application data from RAM to swap just to grow its disk cache.

If this seems odd, consider a scenario where you have some apps running in the background that you’re not using at the moment. Doesn’t it make sense to page out those apps and free some RAM for disk caching to improve the performance of the apps you are using? On a hard drive, it certainly does, but random reads on an SSD are so fast that the benefits of that extra disk cache probably aren’t worth the cost of swapping.

You can control how likely the kernel is to use swap by altering the appropriately-named “swappiness” parameter. The default value is 60, and reducing this makes the kernel less likely to swap; on an SSD, you can probably drop this all the way to 0. To do that, add this to your “/etc/sysctl.conf” file, and then either reboot or run “sudo sysctl -p” to put it in to effect:

vm.swappiness = 0

Another parameter, “vm.vfs_cache_pressure”, is often mentioned in SSD tuning guides, too — this controls caching of directory and inode objects, with values lower than the default of 100 making the kernel more likely to keep those cached in RAM. The effect this has isn’t entirely clear to me, but if you want to experiment, add this to your “/etc/sysctl.conf” file:

vm.vfs_cache_pressure = 50

Of course, these values are just guides — if you find other values that work better for you, go with those instead. Don’t be afraid to leave these at their default values, either; Linux has a multitude of tunable paramaters, and just because you can tune something doesn’t mean you should, especially if you’re unsure what effect different values might have.

A note on drive life estimates

After two weeks, I'm yet to go through a single full erase cycle on my drive. That's reassuring!

After two weeks, I’m yet to go through a single full erase cycle on my drive. That’s reassuring!

It’s worth mentioning, too, that this 72TB estimate of the M4’s lifetime seems to be somewhat conservative. Its flash cells can handle about 3000 erase cycles before failing, so if you overwrote all 256GB of flash 3000 times, you’d get not 72TB of writes, but 750TB. The factor-of-ten disparity between these two figures is due to a phenomenon called write amplification, where the shuffling of data performed by wear leveling and garbage collection causes some data to be written to the underlying flash more than once.

The controllers inside SSDs strive to keep write amplification as close to 1 as possible (that is, no amplification), and some even use compression to push it below 1 in some cases. How successful they are depends on the several factors: the nature of the workload, how much spare flash the controller has to work with (this is where TRIM really helps), and just how good the controller’s algorithms are. A write amplification factor of 10 is really quite extreme, so I’d expect my M4 to last far beyond 72TB of writes (assuming the controller doesn’t fail first).

The 3000 erase cycles is just a conservative estimate, too — that’s when flash cells are likely to start dying, but they won’t all die at once, and most SSDs include some amount of spare flash that they can substitute for failed cells. In one endurance test, a user managed 768TB of writes to a 64GB Crucial M4; at that smaller size, that works out to more than 12000 erase cycles.

dr strangedrive or: how I learned to stop worrying and love SSDs

I’ve had bad luck with hard drives lately — in the last month or so I’ve lost two of the drives from my desktop PC. Luckily, I’d set up RAID-1 for my Linux install just beforehand, so I didn’t lose anything important (just my Windows drive, hah), but with just one drive left, I needed some kind of replacement.

I could’ve bought another hard drive, but damnit, spinning disks are from the past, and we’re living in the future! Instead, I bought myself a shiny new SSD.

Wolf in mini-sheep’s clothing

To be specific, I got a 256GB Crucial M4 — it’s not the latest and greatest SSD, but it’s been on the market long enough to prove its reliability. It looks so unassuming in its tiny, silent 2.5″ case, but it’s crazy-fast, with read speeds of 450MB/s, write speeds of about 260MB/s (not as fast as some newer drives, but perfectly respectable), and insanely-fast seek times that can make it dozens or even hundreds of times faster than a hard drive in real-world applications.

More than anything else, an SSD makes your PC feel strangely snappy. Boot times and application launch times both benefit hugely — Firefox now takes less than a second to spring to life, even if I’ve only just booted my PC, and staring LibreOffice takes maybe half a second.

Even when attached to a 3.5" bay extender, SSDs look tiny compared to 3.5" hard drives

Even when attached to a 3.5″ bay extender, SSDs look tiny compared to 3.5″ hard drives

To get some numbers, I tested something that’s always been slow on my studio PC: loading large instruments in to LinuxSampler. LS streams most of the sample data on-the-fly, but it still needs to cache the start of each sample in to RAM, and that requires a bunch of seeking. Here you can see the load times for Sampletekk’s 7CG Jr, a 3GB GigaSampler file, and the Salamander Grand Piano, a 1.9GB SFZ, from both my SSD and my old 1TB Seagate Barracuda 7200.12 hard drive — the SSD is about 4-to-6 times faster:

made with ChartBoot

Is flash’s limited lifetime really worth worrying about?

So, SSDs have fantastic performance, and they’re now (relatively) affordable, but I did have one concern: the fact that flash memory cells can only be erased a certain number of times before they wear out. Modern SSDs use techniques like wear-leveling and over-provisioning to minimise writes to each flash cell (this Ars Technica article is a great read if you want to know more), but it’s hard not to think that every byte you write to the drive is hastening its demise.

I worried even more after I ran “iotop” to look at per-process disk usage, and saw that Firefox was writing a lot of data. It writes several things to disk on a regular basis — cached web content, knowing malware/phishing URLs, and crash recovery data — and that can add up to several MB per minute, or several GB per day.

To see if this really was a problem or not, I used iostat to capture per-minute disk usage stats across a typical day. I did all my usual things — I left Firefox, Chrome, Thunderbird, and Steam running the whole time, I spent my work hours working, and then I toyed with some music stuff in the evening. The results are graphed below:

made with ChartBoot

There’s one hefty spike in the evening, when I copied 3.6GB of guitar samples from my hard drive to my SSD (maybe this wasn’t an entirely typical day!), but for the most part, I was writing about 5-15MB per minute to the SSD. The total for the day was 15GB.

That sounds like a lot, but it’s nothing my SSD can’t handle. It’s rated for 72TB of writes over its lifetime, and while that’s an approximate figure, it’s a useful baseline. Over a five-year lifespan, that works out to 40GB of writes a day, or 27.8MB per minute — that’s the red line on the graph above, which was well above my my actual usage for almost the entire day.

When you see a graph like this, it flips your perceptions. If I’m happy to accept a five-year lifespan for my SSD, then every minute I’m not writing 27.8MB to it is flash lifetime that’s going to waste! Smaller SSDs tend to have shorter lifetimes, as do cheaper SSDs, but with typical desktop usage, I don’t think there’s any reason to worry about the life of your SSD, especially if you’re not using your PC 10-12 hours a day or running it 24/7 like I often do.

SSD tuning

There are dozens of SSD tuning guides out there, but most of them spend a lot of time whipping you in to a “don’t write all the things!” frenzy, so instead of linking to one of those, I’ll just reiterate two things that you should do to get the most from your SSD.

The first is to enable TRIM support. This lets the OS tell the SSD when disk blocks are no longer needed (because the files they contained were deleted, for instance); that gives the SSD more spare space to use, which helps reduce drive wear and increases write performance. To enable TRIM, add “discard” to the mount options on each filesystem on your SSD, like so:

/dev/mapper/ssd-ubuntu_root  /  ext4  discard,errors=remount-ro  0  1

IF you’re using LVM, like I am, then you’ll also have to edit the “/etc/lvm/lvm.conf” file, and add the line “issue_discards = 1” to the “devices” section, to make sure that LVM passes the TRIM commands through to the SSD.

The second is to select an appropriate IO scheduler. IO schedulers are bits of code within the Linux kernel that arrange read and write operations in to an appropriate order before they’re sent to the disk. The default scheduler, “CFQ”, is designed to keep for desktop loads on regular hard drives, but its efforts are wasted on SSDs, where seek times are so much lower.

For SSDs, you’re better off with the “deadline” scheduler, which is designed for high throughput on servers, where disks tend to be faster, or you can even use the “noop” scheduler, which does no reordering at all. To set the scheduler on boot, add this to your “/etc/rc.local” file (most Linux distros have one of these):

echo deadline >/sys/block/sda/queue/scheduler

To be honest, the choice of IO scheduler probably won’t make much difference — it just improves performance a little (it won’t have any impact on lifespan), but your SSD is going to be so fast regardless that I doubt you’d ever notice. It’s an easy fix, though, so it’s worth the 10 seconds it’ll take to perform.

So go forth, buy an SSD, make a couple of minor tweaks, and then don’t be afraid to enjoy it!