2008 Storage Hierarchy

We call them “computers”, but the software and hardware are overwhelmingly concerned with storing and retrieving data. Yesterday’s Disk Performance research fit into a larger context in the QConf presentation; a survey of all the levels of storage that make up a modern system.

	Typical latency	Typical throughput
Registers	< 1 nsec	n/a
Cache	< 10 nsec	> 100 G/sec
DRAM	< 500 nsec	> 1 G/sec
Dist. hash table	< 50 µsec	n/a
SSD	< 200 µsec	> 100 M/sec
Disk	< 10 msec	> 100 M/sec
Tape	< 60 sec	> 100 M/sec

Now for the explanation:

Registers · These live right inside the CPU operate at its cycle speed; a single instruction can involve more than one.

Cache · Since CPUs have been getting faster much faster than DRAM has been getting faster, computers all have cache between the processor and main memory. There may be multiple levels, referred to as “L1”, “L2”, and “L3”; the number and performance of these things vary from system to system.

DRAM · Also known as “main memory”; present in amounts in the one-to-three-digit number of gigabytes.

Distributed Hash Table · memcached and friends; widely deployed in nearly all modern Web applications. This is a computer with a whole lot of DRAM, connected via a fast network, with the memory used as a lookaside cache to avoid unnecessary database accesses.

SSD and Disk · See yesterday’s 2008 Disk Performance.

Tape · It’s not going away, particularly in a regulation-heavy environment where businesses are increasingly required to store everything forever. These days, tape subsystems include a robot mechanism to mount and dismount tape cartridges, it’s all very hands-off.

Methodology · No magic here, just Web search and digging for publicly available data, plus a few Bonnie runs. I’m sure there are widely-used computers out there with numbers outside of the ranges I’ve suggested at most levels of the hierarchy. But I think these are pretty representative, and what really matters is the orders-of-magnitudes relationships between them; the thing that system designers need to keep in mind as they ponder their storage options.

Oh, and obviously these numbers are highly time-sensitive; they were different last year and they’ll be different next year.

Contributions

Comment feed for ongoing:

From: Brian Von Herzen (Nov 21 2008, at 12:05)

You might want to add a 3rd column to your table: capacity per device, thus illustrating the trade-off between bandwidth and capacity and latency and capacity.

brianvon@fpga.com

[link]

From: Steve Loughran (Nov 21 2008, at 12:12)

There's also power consumption, which could be used to measure joules per unit of useful work (say J to get a gigabyte to somewhere for a CPU to work on), J/Gb/month storage and J/GB to write.

SSD is low J/GB read and J/GB/month to store (purely the HVAC costs of the room)

; writing is a bit higher. Tape has a much lower J/GB/month, depending on where it is stored, though retrieval can be trickier. When you raid stripe HDD its monthly energy costs increase, and SANs, well, they add more. I'd like to look at the energy cost of real versus virtual hadoop clusters.

[link]

From: Sam Greenfield (Nov 21 2008, at 13:28)

I don't know if I would call tape hands-off. I have never worked with a robotic system for nearline store (optical or tape) that hasn't required some sort of TLC when the system jams in some way.

[link]

From: Jeremy Zawodny (Nov 21 2008, at 14:12)

Power consumption is an interesting angle indeed. Since coming to work at Craigslist, it's something I've thought about more than I ever did at Yahoo.

When I requested that some machines get upgraded from 16GB to 32GB, a sysadmin asked if I was sure I needed it--because more RAM == more power.

That was something I never expected but made complete sense.

[link]

From: Steve Loughran (Nov 22 2008, at 06:26)

Yes, Ram = power. When you consider that a lot of RAM is used as a cache for the HDD, in-memory storage to handle the latencies of disk, then its possible that SSD front end boxes need less RAM. But that may need some OS tuning so it's expectations/assumptions are different.

[link]

From: Pádraig Brady (Nov 25 2008, at 01:46)

That drop in latency from Disk -> SSD is going to have huge implications, and will simplify a lot of things especially in a multicore environment. I commented on that here:

http://www.pixelbeat.org/docs/memory_hierarchy/

[link]

From: Martin Probst (Nov 25 2008, at 02:58)

Is the number for distributed hash tables including the time for eventual cache misses?

I always thought the nice feature of SSDs is that you can have persistent data storage with access times on the order of memcached (or close enough), and that this storage can be _much_ larger than your typical memcached cluster at a lower price tag.

[link]

From: Edward Vielmetti (Dec 12 2008, at 18:13)

On the power consumption question, I collected a set of anecdotes (or maybe data points) around power consumption measured in terabytes per kilowatt.

http://vielmetti.typepad.com/vacuum/2008/06/kilowatts-per-t.html

That table looks like (as of 6/08):

SSD: 0.4T/KW

Fiber channel: 4T/KW

SATA: 17T/KW

Petabox disk (2005): 20T/KW

Petabox disk (2008): 37T/KW

COPAN MAID: 100T/KW ("mostly idle disks")

Blu-ray read-only farm: 500T/KW

The last is really weird - think of a stack of blu-ray drives waiting to spin up and deliver the data when you need them.

It should be straightforward to scale this to joules/month, though as with any computing activity actual performance is highly dependent on load patterns.

[link]

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

November 21, 2008
· Technology (90 fragments)
· · Storage (30 more)

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!