ScaleAholic: 2007

Monday, September 10, 2007

3.5 Rules in Distributed Algorithm Design

In highly concurrent distributed computing their are all kinds of algorithms one can learn. One can read books like Concurrent and Distributed Programming in Java and Distributed Systems: Principles and Paradigms. One can learn how to write scalable servers by reading things like these papers on SEDA. But when creating a distributed algorithm their are 3 simple Goals/Rules you need to follow. While these rules were developed with the Terracotta approach in mind they are mostly applicable to any distributed computing approach:

Use algorithms that allow for maximum concurrency

Seems obvious but design your algorithm to allow maximum concurrency. In a single JVM concurrency is important. In a distributed environment where lock acquire and release is almost certainly more expensive it is that much more important
Use read locks where possible. If you can have multiple readers look at something, duh, don't stop them.
Try strategies like striping, lock on fine grained objects etc (this is some of what concurrent hash maps do).

Minimize chatter

Many really cool concurrent algorithms are less good for distributed computing because of the amount of chatter (data that needs to be communicated between nodes).
Algorithms that require too much cross node book keeping are a problem. Networks are slow and relatively thin pipes. Chattiness plays into that weakness and can also eat cpu.

Take advantage of locality of reference

Use algorithms that partition well

Use algorithms that can mostly or entirely act on local data.

Scale-out architectures don't have all the data everywhere. They rely on various levels of caching both near and far for optimal performance. When hitting data try to hit data in the same node where possible.

Rule 3.5 exists as well. Remember not to guess at the performance of something. Test it, time it and automate as much as possible of those tests and timings.

This has been a public service announcement.

Saturday, July 21, 2007

A Job building Terracotta

Opportunistic would be the best way to describe our hiring style at Terracotta. We always have our eyes open for the super sharp so I figured I would post about what we generally look for and see if anyone out there shows up to make more magic happen for us. I talked a bit about what I think people should look for when hiring in my blog on teams.

If you are:

Passionate - Do you actually love having complex problems to solve. Can you not sleep at night thinking about how best to design, factor, improve software. Are you always striving for improvement and learning.

Knowledgeable - I'm not going to list a bunch of frameworks here. The knowledge we mostly look for is about knowing how to solve tough problems. The one thing that is a must is you must know multi-threaded programming. Having experience building some kind of infrastructure software is a big plus. Whether it's building an App server, a database or jms system isn't so important but knowing how to build things that need to be scalable is a big plus (just to be clear, not using those things, building them). For some of our stuff knowing classloaders cold and byte code manipulation is a big plus as well. Most of all you must love and be good at writing software.

Respect/Teamwork - This is complex stuff and we are not a big company so you need to be able to talk to and work with others. No room for people who are pedantic and or self serving. It has to be about the product and the software for you not the title. This is true no matter what role you are looking at filling.

Judgement - Each person has to have excellent judgment. Must be able to focus on what's important. If you don't know something, that's not a problem but communicate, ask questions don't rat hole and don't hide problems.

Intelligence - Not going to lie. You must be smart. You should be a person who can analyze and design complex algorithms. Debug and solve complex problems. We will ask you to answer algorithm questions and write some code in the interview.

I know that's a pretty tough list but if you think what we do is cool and fit most of that please shoot your resume to careers at terracottatech dot com.

For those who don't already know about Terracotta here is a brief overview:

We are a well funded startup based in San Francisco but with developers all over the world. Our product is open source network attached memory for Java and is probably the most interesting and diverse product one could work on stretching from distributed computing, to byte code manipulation.

Thursday, July 19, 2007

What were those results again?

This is a small follow up to my blogs about anti-patterns. When trying to debug a complex logic or performance problem one of the most important things one can do is take notes. I had a conversation with someone the other day where the person said, "I'm in a rush, I don't have time to take notes about my runs." To this I replied, you don't have time not too.

So often we are in such a rush to solve a problem that we cut the wrong corners. When performance tuning, and/or tracking something down that requires multiple runs or configurations of your software always always always take notes on each run. They don't have to be super formal but you should write down all the details you can think of. Some examples include, what were my settings, what did the cpu usage/machine stats look like. What problems did I run into. Always keep a date/time stamp on the tests. This will prevent the inevitable rerunning of tests because you forgot the results, or mixing up what you have tried and not tried. It only takes one mistake to use up more time than tons of note taking would require.

Friday, July 13, 2007

More Lies - Distributed Performance Testing Anti-patterns Part 3 of 4

In part 3 out of 4 of this blog, much like parts 1 and 2 I will hit on anti-patterns that allow your performance testing of clustered and or distributed software to lie to you. I'll be following up part 3 of this blog, the last 4 anti-patterns, with a blog about a simple distributed testing framework I have begun. Hopefully enough of you will be interested, try it and maybe even contribute to it.

Anti-pattern 7: In-memory vs. Distributed Performance Comparison

Description

Writing a test that compares the speed of adding objects to a local, in-memory data structure vs. adding objects to a clustered data structure.

Problem

To avoid suspense, I'll tell you the results of that test without running it. Adding things to a local, in-memory data structure stakes virtually no time at all. In-memory object changes happen so fast, they are hard to even measure. However, when you are making changes to a distributed data structure, no matter what, those state changes have to be shipped off to another location. This takes instructions to be executed to make this happen on top of the ones used for the original task. This isn't just slower, it is way slower. The comparison between in-memory object changes and distributed object changes is useless.

Solution

Figure out how much data you are going to be clustering and what the usage patterns of that data becoming clustered will be. Then simulate and time that. Once again, focus on total throughput with acceptable latency.

Anti-pattern 8: Ignore Real-world Cross-node Patterns

Description

Reading and writing the same data in every node.

Problem

Generally speaking, whether reading or writing, it is more expensive to access the same data concurrently across all nodes. Depending on the underlying clustering infrastructure, this can be more or less of a problem. If you are using an “everything everywhere” strategy, the performance hit of random access across all the data on all the nodes is less, but the “everything everywhere” sharing strategy generally does not scale well. Most other strategies perform better when data access is consistently read and or written from the same node

Solution

Write your performance tests in a way that allows you to set a percentage for locality of reference. Is an object accessed on the same node 80%, 90%, or 99% of the time? You should usually have some cross-node chatter, but usually not too much—although you should be as realistic to the problem you are trying to solve as possible.

Anti-pattern 9: Ignore Usage Patterns

Description

The performance test either just creates objects or just reads objects

Problem

In the real world, an application does a certain amount of reading, writing, and updating of shared objects. And those reads, writes, and updates are of certain sizes.

Solution

If your app likely changes only a few fields in a large object graph, then that is what your performance test should do. If your app is 90% read from multiple threads and 10% write from multiple threads than that is what your test should do. Make your test be true to what you need when it comes to data and usage.

Anti-pattern 10: Log Yourself to Death

Description

Last, but far from least, doing extra stuff like writing data out to a log chews up CPU. Logging too much in any performance test can render the test results meaningless.
This anti-pattern generally covers any extra CPU usage on a load-generating client that affects the performance test. In general, if one or more of your nodes is CPU bound in a cluster performance test, you likely have not maxed-out the performance of your cluster. Let me say that again, if you are resource constrained on any node, including your load generating nodes (but not including your server if one exists) then you are probably not maxing out what your cluster as a whole can handle. Investigate further.

Problem

If the individual load-generating nodes—or even the clustered nodes—are resource constrained, it is likely to create a false bottleneck in your test. You are trying to figure out the throughput of the cluster and your cluster nodes are likely busy doing other things like logging.

Solution

First, always have machine monitoring on all nodes in a performance test. Any time one of the nodes or load generators becomes resource constrained make sure you test with an additional node and see if it adds to the scale. If a node is unexpectedly resource constrained, then take a series of thread dumps (java only) and figure out where all the time is going.

Alright, that is the end of my anti-pattern list for now. I could probably come up with a few more but I'll save them for another day. The moral of this section of the blog is to be curious and skeptical with your testing results. Don't just ask what the numbers are. Find out why and you will end up a much happier person.

Monday, July 2, 2007

Distributed Performance Testing Anti-patterns Part 2 of 4

In Part 1 of this 4 part blog I hit upon 3 Anti-Patterns that can make one's performance testing a poor representation of reality. Here I'm covering 3 more and will be following up with the last 4 in a few days. After that I'm going to talk about a simple distributed performance testing framework I'm going to give away to try and help people be more successful with this stuff.

Anti-pattern 4: Fake Data Fake Performance

Description:

Using data in a distributed performance test that looks nothing like your real data.

Problem:

Distributed computing solutions use all kinds of strategies to move data between nodes under the covers. Just representing a size of data to be shared ignores those strategies and in many cases misrepresents the performance of a real system with real data under real load, both positively and negatively. You may be testing specially optimized flattening tricks that make the system look faster than it is; likewise, you may be testing a particular case that doesn’t perform well, but that isn’t representative of the true performance of the system with real data.

Solution:

Make sure you test with object graphs that vary in size, type, and depth in similar ways to the data you plan to use in your application. Don't assume Maps of Strings will behave anything like the way real object data will behave.

Anti-pattern 5: Incoherent Cluster

Description:

Some clustering products are coherent, some are not, and some have both modes. Don't ignore whether you are testing the performance using the mode you really need for your application.

Problem:

While it is quite possible to have a coherent cluster that has the same throughput as an incoherent cluster, it is certainly harder to do. Coherently clustered software frameworks require the provider to do some fancy locking, batching, windowing, and coherent lazy-loading tricks that aren't for the faint of heart (in the internals of the clustering engine, that is, not for the application developer). You can't assume that performance between a coherent and incoherent clustering approach will be the same.

Solution:

Make sure that if what you need is coherently clustered data that you are actually testing that way. Also, if it’s coherence you’re after, it’s a good idea to verify the end-state of a performance test to make sure the system actually is coherent. Sort of post test verify phase.

Anti-pattern 6: The World by a Thread

Description:

Distributed tests that only use one thread per node.

Problem:

For most clustered software, the name of the game is throughput with acceptable latency. Pretty much all distributed computing software does batching and windowing to improve throughput in a multi-threaded environment. Maxing out a single thread will usually not even approach the max throughput of the JVM or the system as a whole in the same way that a single node will not.

Solution:

Make sure your test uses multiple threads for generating load in each JVM. Check to see if you are cpu bound on any node. If you are not cpu bound you might have a concurrency issue or just need to add more threads.

Conclusion:

I have 4 more anti-patterns that I'm going to publish next week. Keeping an eye on the full 10 will help greatly reduce mistakes in clustering and distributed computing. Once again I'll then be following up with a framework to help develop and run useful tests.

Wednesday, June 27, 2007

Why Your Distributed Performance Tests Are Lying to You: Anti-Patterns of Distributed Application Testing and Tuning - Part 1

Clustering and distributing Java applications has never been easier than it is today (see Terracotta). As a result, writing good distributed performance tests and tuning those applications is increasingly important. Performance tuning and testing of distributed and/or clustered applications is an important skill and many who do it can use a little help. Over my next few blogs I'm going to cover a series of anti-patterns in this area. I'll be following it up with a simple open distributed testing framework that I hope can help people out (hint, hint, the testing framework itself is distributed to best test distributed apps).

Here are the first 3 anti-patterns...

Anti-pattern 1: Single-Node “Distributed” Testing

Description

Running your “distributed” performance test inside a single JVM.

Problem

Depending on the framework, this can tell you either: 1) nothing, because the clustering framework recognizes it has no partners so optimizes itself out or 2) very little—it might give one an idea of maximum theoretical read/write speed for that framework.

Solution

When trying to evaluate the performance of any kind of clustering or distributed computing software, always use an absolute minimum of 2 nodes (Preferably more).

Anti-pattern 2: Single-Computer “Distributed” Testing

Description

Putting all (or just too many) of the resources for a performance test on one machine.

Problem

This has two problems. First, distributed applications running on the same machine have different latency and networking characteristics than distributed applications on different machines. This can hide various classes of problems around pipeline stalls, batching, and windowing issues.

The second problem is a variation on another anti-pattern I will discuss later around resource contention. By running multiple JVMs on one machine you are now contending for CPU, disk, network, and potentially affecting context switch rate, etc.

Solution

The only real way to test a distributed application is to run it in a truly distributed way: on multiple machines. If you must have multiple nodes/JVMs on one machine, make sure you are running one of the many resource-monitoring tools and make sure you aren't resource constrained (I use iostat/vmstat for simple tests).

Anti-pattern 3: Multi-Node, Load Only One

Description

Testing with multiple nodes but only sending load/work to one of those nodes while leaving the others just hanging out doing little or nothing

Problem

Depending on the distributed computing architecture chosen, the nodes that are not receiving load may be actually doing a lot of work. If that's the case, only loading one of the nodes is giving a false sense of performance. Also, in some cases, data is lazily loaded into nodes so only putting load on one node could be putting you in the same boat as the single-node tester where no actual clustering is happening.

Solution

When testing clustering software, make sure you are throwing load at all nodes.

Be sure to check back soon as the next few anti-patterns will cover the data aspects of distributed performance testing...

Thursday, June 14, 2007

Latency v Throughput

Which is the faster way to get your cargo across the United States. A plane or a train? Some might think the answer is obvious. A plane travels 500 mph (or so) and a train does maybe 80 mph. Therefore the plane is faster. Or is it? The question is really a matter of latency vs. throughput.

Imagine you have to move a bunch of coal across the country and deliver it to a coal processor. Now say that on the west coast, the receiver of the coal can process 100 units of coal an hour. You have 1 train that can haul 10,000 units of coal and takes 48 hours to get to its destination. You have 1 plane that can deliver 100 units of coal in 12 hours.

If the most important thing was to have the coal soon, then the plane is faster (lower latency). But, if the most important thing is to have the coal-processing pipeline filled on the west coast over time then train is faster (higher throughput). Every 96 hours they get 10k units of coal with the train (remember there’s only one train and, just like the plane, it must make the return trip to the east coast). That works out to about 100 units an hour which is just what you need. With the plane, every 96 hours you get 800 pounds of coal. Not nearly fast enough.

The above discussion may seem obvious but I have this conversation all the time when talking about Software: what is fast and what is slow. I've had people tell me it's impossible to do 10 thousand transactions per second in Terracotta when persistent because the disk seek time is 10 millis. Well they would be right if you serialize things. But in infrastructure software, the game is throughput with acceptable latency and it turns out 10 thousand transactions per second isn't all that hard. With parallelism, batching, and windowing, the disk isn't even usually the bottleneck.

Anyway, just wanted to get the throughput v latency thing off my chest.

Tuesday, June 12, 2007

Now that's fast...

Alright, I promise I'll get back to blogging about Java and Terracotta stuff next time but... I've been reading a lot of negative press about Apple's Safari 3 beta and while some it is fair I haven't seen a lot of talk about the good stuff about it. So before people flame me let me start with:

Yes, I know it has security holes and those need to be fixed
Yes, I know it has some bugs (like it doesn't work with Zimbra for me)

But...

It still has all the features from Safari 2:

reset browser so when your surfing on someone else's computer you can clean up everything you've logged into like e-mail.
Really good tabbed browsing
Private browsing (for when you want to pause the caching and recording of what your browsing)
plus RSS, popup blocking and the other usual suspects.

Good new stuff in 3.0:

This thing is blindingly fast? I haven't taken actual timings but just from eyeballing it this thing is super fast. It is much faster than what was already fast Safari 2.0. And much much faster than Firefox. I don't have the time to do real benchmarks on this but I would love to see some.

Much improved inner search. How many times have I hit command F, typed some text, seen the window move but not be able to find where the highlighted word is. Safari does a really nice animated bubble highlight that is impossible to miss. Kudos to the Apple guys for simple subtle improvements.

Plus I think it is supposed to be more standards compliant and it has this resize textbox feature which I haven't tried yet. update: Works great but worth noting that it only works for multi-line test fields not the single line variety.

update2: Someone pointed out that Safari3 also now has WYSIWYG editor support. Should have mentioned it since I used that when writing the blog :-)

Anyway, don't want to sound like a fanboy boy, and I might be alone, but I actually like Safari 3.0 and think windows users should give it a go to.

Saturday, May 19, 2007

Performance Architecture

I was reading this article from another guy named Steve Harris (not me). I think he makes some interesting points but I thought I would follow up. Optimizing architecture to me mostly means designing for flexibility and testability. It also means having a way to test ones app for performance and scale. I've seen a number of people write apps, put them in prod without ever performance testing them.

Write your app so that you can overlay the performance architecture after the fact. Have a way to EASILY validate your performance and you'll be ok. Optimizing code early (which he doesn't mention) is actually quite evil and almost always leads to badness.

Teams, starts with the people

What are the needed traits across a company to have the greatest chance of success? How much of each trait does each individual have to have? These are questions that everyone should ask themselves on both sides of the interview process and in ones everyday life.

My list is as follows (not in any order):

Passion/Pride/Work ethic
Knowledge/experience
Respect/Teamwork
Judgement
Intelligence

The list itself can vary though probably not as widely as how much one values each item on that list. The most important thing is to have the list and to use it as a framework for how you choose jobs, hire, value and motivate existing people, and at times part ways.

Since words tend to have very different meanings to different people despite the existence of dictionaries and wiki's I'm going to cover a bit what my list means to me.

Passion/Pride/Work Ethic

First let me start with what this doesn't mean. Passion and pride does not have anything to do with outward energy. One does not have to be "Rah Rah" to have passion and pride. Many of the people who I feel have had the most passion have actually been super quiet. Passion is about commitment to the task at hand and the greater goal. It's about going the extra mile to get things done well, on time, and sometimes in ways that nobody else thought of. It's about finding the things that need to get done that nobody noticed and being the person to do them while not dropping the things they are supposed to be doing :-).

Knowledge

Knowledge is what you "know" coming into a problem. Intelligence is what you can learn/how well one solves problems. In many cases knowledge is the least important of the five. For given positions one should always give some thought to how much specific knowledge is actually needed. I constantly see job postings with a long list of knowledge items and almost no indication of the other 4 items importance. IMHO If you can find people who excel in the other four areas, knowledge of a given specific task can be lower (though not zero. Working with people who have zero knowledge of their domain can be very expensive. It's less of an issue for large companies with time and money).

Respect/teamwork

I focus on respect more than one might imagine. I worked with a guy who said he spends 75 percent of his time in an interview trying to figure out if the person will work well in the team. I think that's a good idea. It's not about whether the person is social or fun. It's about whether they present their ideas with an understanding that those around may have better ones. Do they listen to feedback, even from people they do not expect to have good ideas. Do they listen in general or do they just wait for their turn to talk. This is important stuff. Don't underestimate it.

Judgement

This is another really important one. Their is a nice coverage of how important this is in the book "Producing Open Source Software.(A free version of this book exists)" Whether someone is your boss, your co-worker or your employee, you need to be able to trust their judgement more than anything else. If the people around you are smart but have bad judgement they will create excellent solutions to the wrong problems. If people are not smart and have bad judgement they will create poor solutions to the wrong problems. On the other side of the coin, if they are smart and have good judgement they will create great solutions to the right problems. Most importantly, if they are not smart (and on any given day all of us are not smart) but have excellent judgement they will still make good decisions about what to do, and most usefully, what not to do. While I find it really really hard to interview for, Judgement is huge!

Intelligence

Having some Intelligence is important. More important for some jobs than for others. When hiring, know how much intelligence (balanced against the others) is needed for the position/type of work. For some jobs, judgement and hard work with some intelligence is what is needed. Have questions in mind that will indicate whether a person is smart enough. Since intelligence is the easiest of the bunch to test for Don't Forget to Do it! To many people mistake liking someone, "talking the talk" or even looks for intelligence. Do the work, have your intelligence questions in mind when you interview someone.

The above are some high order bits. When I again get some time and motivation I want to cover team building from a balance perspective. Recognizing and filling ones personal and team gaps.

Ok, it's 2:30 am and I'm starting to question my judgement for being up so late. What's my point here? Think about what you want in employees coworkers and bosses. Sometimes even assign numbers to the core values you look for so you can compare people.

I read that we make up our mind about people we meet in seconds. Before they even speak in most cases and that those first impressions are VERY hard to overcome. So have the tools ready to make wise informed decisions and fight past human nature.

Thursday, April 5, 2007

The int size problem...

I'm just curious. Of the people out there using 64bit jvms, has anyone run into the int collection size problem? One of the things I've been quitely worried about for awhile now is that with both 64 bit jvm's and the virtual heap stuff provided by Terracotta DSO it is possible for an ArrayList or a HashMap to grow beyond what an integer can hold. Sure 2 gig of objects is a ton to be in a collection but if one has a 128g heap it isn't crazy anymore.
It's easy to make the collections able to handle the larger sizes but so much code is written to the current collection interfaces it might be a problem. Have the smart people at Sun started tackling this already? Maybe some JSR?

Saturday, January 27, 2007

Programming by Wishful Thinking

I'm currently reading an excellent book called Everyday Scripting. I'm really enjoying it as a nice primer on whipping things up with Ruby. When I hit chapter 7 the author began to discuss a programming style that I have used for a while now. I had no idea this simple technique had a name. Many of the things I've picked up throughout my career I've learned from books, articles, reading code, and still others I've devised from first principals. This particular practice I picked up while pair programming with this super smart guy Peter Seibel. Apparently it's called "Programming by Wishful thinking."

The idea is that when you have some code to write you first do so as if methods existed already to perform the high level operations. Say you want to write some code that tells you how many songs and albums you have by a given artist. Programming languages don't have methods do to this baked in but you pretend they do. An extreme example of the practice would be:

I came up with this example in about thirty seconds so I hope it isn't to opaque.

You may then go on to write the above methods in the same way:

and so on and so on. Anyway, don't know how many people out there use this style but I've found it really helpful when I need to keep focused on the task/flow at hand without getting to distracted by the details of an issue. Maybe it will help someone else too.