Wednesday, June 27, 2007

Why Your Distributed Performance Tests Are Lying to You: Anti-Patterns of Distributed Application Testing and Tuning - Part 1

Clustering and distributing Java applications has never been easier than it is today (see Terracotta). As a result, writing good distributed performance tests and tuning those applications is increasingly important. Performance tuning and testing of distributed and/or clustered applications is an important skill and many who do it can use a little help. Over my next few blogs I'm going to cover a series of anti-patterns in this area. I'll be following it up with a simple open distributed testing framework that I hope can help people out (hint, hint, the testing framework itself is distributed to best test distributed apps).

Here are the first 3 anti-patterns...

Anti-pattern 1: Single-Node “Distributed” Testing

Description

Running your “distributed” performance test inside a single JVM.

Problem

Depending on the framework, this can tell you either: 1) nothing, because the clustering framework recognizes it has no partners so optimizes itself out or 2) very little—it might give one an idea of maximum theoretical read/write speed for that framework.

Solution

When trying to evaluate the performance of any kind of clustering or distributed computing software, always use an absolute minimum of 2 nodes (Preferably more).


Anti-pattern 2: Single-Computer “Distributed” Testing

Description

Putting all (or just too many) of the resources for a performance test on one machine.

Problem

This has two problems. First, distributed applications running on the same machine have different latency and networking characteristics than distributed applications on different machines. This can hide various classes of problems around pipeline stalls, batching, and windowing issues.

The second problem is a variation on another anti-pattern I will discuss later around resource contention. By running multiple JVMs on one machine you are now contending for CPU, disk, network, and potentially affecting context switch rate, etc.

Solution

The only real way to test a distributed application is to run it in a truly distributed way: on multiple machines. If you must have multiple nodes/JVMs on one machine, make sure you are running one of the many resource-monitoring tools and make sure you aren't resource constrained (I use iostat/vmstat for simple tests).


Anti-pattern 3: Multi-Node, Load Only One

Description

Testing with multiple nodes but only sending load/work to one of those nodes while leaving the others just hanging out doing little or nothing

Problem

Depending on the distributed computing architecture chosen, the nodes that are not receiving load may be actually doing a lot of work. If that's the case, only loading one of the nodes is giving a false sense of performance. Also, in some cases, data is lazily loaded into nodes so only putting load on one node could be putting you in the same boat as the single-node tester where no actual clustering is happening.

Solution

When testing clustering software, make sure you are throwing load at all nodes.


Be sure to check back soon as the next few anti-patterns will cover the data aspects of distributed performance testing...

Thursday, June 14, 2007

Latency v Throughput

Which is the faster way to get your cargo across the United States. A plane or a train? Some might think the answer is obvious. A plane travels 500 mph (or so) and a train does maybe 80 mph. Therefore the plane is faster. Or is it? The question is really a matter of latency vs. throughput.

Imagine you have to move a bunch of coal across the country and deliver it to a coal processor. Now say that on the west coast, the receiver of the coal can process 100 units of coal an hour. You have 1 train that can haul 10,000 units of coal and takes 48 hours to get to its destination. You have 1 plane that can deliver 100 units of coal in 12 hours.

If the most important thing was to have the coal soon, then the plane is faster (lower latency). But, if the most important thing is to have the coal-processing pipeline filled on the west coast over time then train is faster (higher throughput). Every 96 hours they get 10k units of coal with the train (remember there’s only one train and, just like the plane, it must make the return trip to the east coast). That works out to about 100 units an hour which is just what you need. With the plane, every 96 hours you get 800 pounds of coal. Not nearly fast enough.

The above discussion may seem obvious but I have this conversation all the time when talking about Software: what is fast and what is slow. I've had people tell me it's impossible to do 10 thousand transactions per second in Terracotta when persistent because the disk seek time is 10 millis. Well they would be right if you serialize things. But in infrastructure software, the game is throughput with acceptable latency and it turns out 10 thousand transactions per second isn't all that hard. With parallelism, batching, and windowing, the disk isn't even usually the bottleneck.

Anyway, just wanted to get the throughput v latency thing off my chest.

Tuesday, June 12, 2007

Now that's fast...

Alright, I promise I'll get back to blogging about Java and Terracotta stuff next time but... I've been reading a lot of negative press about Apple's Safari 3 beta and while some it is fair I haven't seen a lot of talk about the good stuff about it. So before people flame me let me start with:

Yes, I know it has security holes and those need to be fixed
Yes, I know it has some bugs (like it doesn't work with Zimbra for me)

But...

It still has all the features from Safari 2:

  • reset browser so when your surfing on someone else's computer you can clean up everything you've logged into like e-mail.
  • Really good tabbed browsing
  • Private browsing (for when you want to pause the caching and recording of what your browsing)
  • plus RSS, popup blocking and the other usual suspects.

Good new stuff in 3.0:

This thing is blindingly fast? I haven't taken actual timings but just from eyeballing it this thing is super fast. It is much faster than what was already fast Safari 2.0. And much much faster than Firefox. I don't have the time to do real benchmarks on this but I would love to see some.

Much improved inner search. How many times have I hit command F, typed some text, seen the window move but not be able to find where the highlighted word is. Safari does a really nice animated bubble highlight that is impossible to miss. Kudos to the Apple guys for simple subtle improvements.

Plus I think it is supposed to be more standards compliant and it has this resize textbox feature which I haven't tried yet. update: Works great but worth noting that it only works for multi-line test fields not the single line variety.
update2: Someone pointed out that Safari3 also now has WYSIWYG editor support. Should have mentioned it since I used that when writing the blog :-)

Anyway, don't want to sound like a fanboy boy, and I might be alone, but I actually like Safari 3.0 and think windows users should give it a go to.