dr strangedrive redux: should you swap on an SSD?

In my recent post about SSDs I did make one major omission, which a friend pointed out on Twitter afterward:

Indeed, I don’t run a swap partition in my desktop PC — RAM is cheap, so I have 12GB of it, and if you’re debating the cost of an SSD, you can probably afford 8-12GB of RAM, too. Let’s play devil’s advocate, though, and say that you can’t upgrade your RAM for whatever reason. Conventional wisdom says that swapping on an SSD is a sure-fire way to send it to an early grave, but is that really the case?

Individual flash cells do have a finite limit on the number of times they can be erased, so it makes sense that if one part of your SSD (say, your swap partition) sees a lot more writes than other areas that it would wear out more quickly. That doesn’t actually happen on a modern SSD, though — they use wear leveling to spread writes as evenly as possible across all available flash. Even if you overwrite a single disk block repeatedly, the SSD’s controller will keep moving that block to different flash cells, transparently remapping things to hide the details from the OS.

Swapping on an SSD, then, should cause no more stress than any other write activity, so it should be perfectly safe, as long as those extra writes don’t push the SSD beyond what it can handle. This calls for another test!

The test

I forced my PC to use swap in a civilised manner, without resorting to pulling out sticks of RAM

I forced my PC to use swap in a civilised manner, without resorting to pulling out sticks of RAM

As in my last post, I observed my write traffic across a typical work day, but with one difference: I removed 8GB of RAM (by rebooting and adding “mem=5G” to my kernel command line, which left me with just over 4GB of RAM once various bits of hardware address space had been accounted for) and replaced it with a swap partition.

The write activity was much more spiky — there are several times when substantial amounts of data are written to swap — and it’s higher on average, too, but it’s clear from the graph that there’s still nothing to worry about. Across the day, about 2.7GB of data was written to swap, and the total data written was 13GB, well below the 5-year lifespan threshold of 40GB/day that I established in my last post.

made with ChartBoot

In fact, if you’re stuck with a PC with limited RAM, I’d heartily recommend swapping on an SSD! It’s so fast that you never really notice that you’re swapping, especially without the sound of a busy hard drive to remind you. In fact, I barely noticed that two-thirds of my RAM was missing.

Swap tuning

With some tuning, you may in fact find yourself using less swap on an SSD than you would on a hard drive. If you’ve been using Linux for a while, you’re probably learned (perhaps after making a semi-panicked “what’s using all my RAM?” post on a Linux forum) that Linux will use all of your free RAM as disk cache to improve performance. However, Linux goes further than that: it’ll sometimes push application data from RAM to swap just to grow its disk cache.

If this seems odd, consider a scenario where you have some apps running in the background that you’re not using at the moment. Doesn’t it make sense to page out those apps and free some RAM for disk caching to improve the performance of the apps you are using? On a hard drive, it certainly does, but random reads on an SSD are so fast that the benefits of that extra disk cache probably aren’t worth the cost of swapping.

You can control how likely the kernel is to use swap by altering the appropriately-named “swappiness” parameter. The default value is 60, and reducing this makes the kernel less likely to swap; on an SSD, you can probably drop this all the way to 0. To do that, add this to your “/etc/sysctl.conf” file, and then either reboot or run “sudo sysctl -p” to put it in to effect:

vm.swappiness = 0

Another parameter, “vm.vfs_cache_pressure”, is often mentioned in SSD tuning guides, too — this controls caching of directory and inode objects, with values lower than the default of 100 making the kernel more likely to keep those cached in RAM. The effect this has isn’t entirely clear to me, but if you want to experiment, add this to your “/etc/sysctl.conf” file:

vm.vfs_cache_pressure = 50

Of course, these values are just guides — if you find other values that work better for you, go with those instead. Don’t be afraid to leave these at their default values, either; Linux has a multitude of tunable paramaters, and just because you can tune something doesn’t mean you should, especially if you’re unsure what effect different values might have.

A note on drive life estimates

After two weeks, I'm yet to go through a single full erase cycle on my drive. That's reassuring!

After two weeks, I’m yet to go through a single full erase cycle on my drive. That’s reassuring!

It’s worth mentioning, too, that this 72TB estimate of the M4’s lifetime seems to be somewhat conservative. Its flash cells can handle about 3000 erase cycles before failing, so if you overwrote all 256GB of flash 3000 times, you’d get not 72TB of writes, but 750TB. The factor-of-ten disparity between these two figures is due to a phenomenon called write amplification, where the shuffling of data performed by wear leveling and garbage collection causes some data to be written to the underlying flash more than once.

The controllers inside SSDs strive to keep write amplification as close to 1 as possible (that is, no amplification), and some even use compression to push it below 1 in some cases. How successful they are depends on the several factors: the nature of the workload, how much spare flash the controller has to work with (this is where TRIM really helps), and just how good the controller’s algorithms are. A write amplification factor of 10 is really quite extreme, so I’d expect my M4 to last far beyond 72TB of writes (assuming the controller doesn’t fail first).

The 3000 erase cycles is just a conservative estimate, too — that’s when flash cells are likely to start dying, but they won’t all die at once, and most SSDs include some amount of spare flash that they can substitute for failed cells. In one endurance test, a user managed 768TB of writes to a 64GB Crucial M4; at that smaller size, that works out to more than 12000 erase cycles.