Transparent Storage

In preparation for that Disk Performance piece and accompanying keynote last week, I spent quite a bit of time with the new Unified Storage “Fishworks” Analytics Software, which is fascinating stuff. Herewith an illustrated report.

[Disclosure: This piece is an unqualified rave about a new product from my employer; go elsewhere if such things offend you. Those who read me regularly know that I don’t often do this; it’s because this one hits close to an area I care about deeply, and I’m genuinely knocked out.]

[Update: It turns out that there’s a nice slide-show with tons of details about what the analytics do: Analytics in the Sun 7000 Series (PDF).]

Conclusion · If your application’s throughput depends on storage performance, you need this. (Or something like it; I don’t know what the competition has on offer.) Otherwise you’re flying blind.

What Happened Was · I wanted to run Bonnie on the new gear, so I asked the Fishworks guys, and not only did they set me up a test system, Brendan Gregg ran Bonnie himself a couple times first and sent some thought-provoking Analytics screen shots.

So when I was doing my own runs, I pointed a browser at the 7410’s built-in server and had a look. I suppose that I am sort of the ideal customer for this stuff, given that I have an I/O-intensive app (Bonnie) that I claim to understand pretty deeply, and would really like to know what’s happening behind the curtain while it’s running.

It takes a long time to do a big Bonnie run, and not very long to learn how to use the Analytics. What the screenshots fail to show is how engrossing this thing is while it’s running; the graphs flow across the screen updating themselves in real time via AJAX magic. If you’re an I/O-performance weenie, you can get sucked in and lose a half-hour at a time just watching ’em. Another nice thing is that the system organizes your data into “worksheets”, which are handy to save and restore.

Anyhow, let’s look at a picture.

This (you’ll probably need to enlarge it) is taken during the Bonnie phase where the programs runs through a big file (256G in this run) sequentially, reading each 16K chunk of the file, dirtying it, and writing it back out in place. What is this trying to tell us?

The chart at the top shows the network traffic the appliance is seeing, which, if you think about it, represents what the client is trying to do. Since this is from the appliance’s point of view, “out” means the client is reading and “in” means that it’s writing. The numbers of reads and writes are about the same, which is what you’d expect.

But the second graph, of actual disk I/O, tells a different story. Clearly it’s not caching the reads much (the steady orange part), but it’s saving up the writes (the spiky blue part) and committing them to disk every five to ten seconds. When it’s writing, it gets busy and the client sees a pause in the traffic for part of a second. Unsurprising, but you’d never know without this kind of tool.

The bottom graph shows where in the 11-TB filesystem the I/O is actually happening. The answer is “all over the place”, since ZFS is a journaling filesystem and isn’t actually overwriting the file’s original blocks, but putting the new data somewhere else.

Asking Questions · At this point it occurred to me to wonder “am I the only person stressing this appliance out?” So I hit the ”Add statistic“ link at the top of the screen and a bit of poking around revealed that it would graph not only which clients were hitting the appliance but exactly which files they were hitting. I won’t show you that graph because it was boring, revealing that Bonnie’s scratch file was getting all the traffic. Which answered the question: “Yes, I’m the only person sending work to this thing.”

At that point, I started getting interested in that row of little icons above each chart, and the picture of the power drill led me to “Drill Down”. So I drilled down on the actual Bonnie scratch file and the bottom chart in the screen below, a graph of the file (not filesystem) offsets where the I/O was happening.

By this time, the rewrite phase had completed and (I think) the block-read phase had just started; you can actually see Bonnie walking through the file at a rate of around 10 gigabytes a minute, which is not bad for a single-threaded program.

I ended up asking the system a bunch of obvious questions about Bonnie and was surprised by only a couple of the answers; those deserve further research and maybe a little bit of Bonnie re-engineering.

Last Picture · Here’s a graph of Bonnie’s final phase, when she seeks around in that big honking file to a few thousand random locations, reading blocks, dirtying 10% of them, and writing those back out.

You can see the end of the power-read phase, a few seconds’ delay while Bonnie warms up the parallel processes that do the seeking, and then, in the lower graph, the software probing all over the file. There are two things about this graph that really puzzle me, and I’m going to need to spend some quality time with the Bonnie code and the Fishworks analytics to figure out what’s going on. One of the problems should be obvious.

Behind the Curtain · Bryan Cantrill has been writing about the construction of the new storage products in general and these analytics in particular, for example in On Modalities and Misadventures. It’s all done with JavaScript, which is hard to argue with, and XML-RPC, which seems less awful than you’d expect in a tightly-coupled scenario like this. I’d quibble if I weren’t so engrossed by the output.

Take-Aways · Next time I have to wrestle with an I/O-bound system, and I’ve never worked on a big system that wasn’t at some point I/O-bound, I’ll totally insist on this level of diagnostic readout. Because otherwise I’d be working in the dark.