Opportunistic would be the best way to describe our hiring style at Terracotta. We always have our eyes open for the super sharp so I figured I would post about what we generally look for and see if anyone out there shows up to make more magic happen for us. I talked a bit about what I think people should look for when hiring in my blog on teams.
If you are:
Passionate - Do you actually love having complex problems to solve. Can you not sleep at night thinking about how best to design, factor, improve software. Are you always striving for improvement and learning.
Knowledgeable - I'm not going to list a bunch of frameworks here. The knowledge we mostly look for is about knowing how to solve tough problems. The one thing that is a must is you must know multi-threaded programming. Having experience building some kind of infrastructure software is a big plus. Whether it's building an App server, a database or jms system isn't so important but knowing how to build things that need to be scalable is a big plus (just to be clear, not using those things, building them). For some of our stuff knowing classloaders cold and byte code manipulation is a big plus as well. Most of all you must love and be good at writing software.
Respect/Teamwork - This is complex stuff and we are not a big company so you need to be able to talk to and work with others. No room for people who are pedantic and or self serving. It has to be about the product and the software for you not the title. This is true no matter what role you are looking at filling.
Judgement - Each person has to have excellent judgment. Must be able to focus on what's important. If you don't know something, that's not a problem but communicate, ask questions don't rat hole and don't hide problems.
Intelligence - Not going to lie. You must be smart. You should be a person who can analyze and design complex algorithms. Debug and solve complex problems. We will ask you to answer algorithm questions and write some code in the interview.
I know that's a pretty tough list but if you think what we do is cool and fit most of that please shoot your resume to careers at terracottatech dot com.
For those who don't already know about Terracotta here is a brief overview:
We are a well funded startup based in San Francisco but with developers all over the world. Our product is open source network attached memory for Java and is probably the most interesting and diverse product one could work on stretching from distributed computing, to byte code manipulation.
Scaling software should be an activity done with ease by Hardware, Cloud and System Administrators without knowledge of the applications being scaled.
Saturday, July 21, 2007
Thursday, July 19, 2007
What were those results again?
This is a small follow up to my blogs about anti-patterns. When trying to debug a complex logic or performance problem one of the most important things one can do is take notes. I had a conversation with someone the other day where the person said, "I'm in a rush, I don't have time to take notes about my runs." To this I replied, you don't have time not too.
So often we are in such a rush to solve a problem that we cut the wrong corners. When performance tuning, and/or tracking something down that requires multiple runs or configurations of your software always always always take notes on each run. They don't have to be super formal but you should write down all the details you can think of. Some examples include, what were my settings, what did the cpu usage/machine stats look like. What problems did I run into. Always keep a date/time stamp on the tests. This will prevent the inevitable rerunning of tests because you forgot the results, or mixing up what you have tried and not tried. It only takes one mistake to use up more time than tons of note taking would require.
So often we are in such a rush to solve a problem that we cut the wrong corners. When performance tuning, and/or tracking something down that requires multiple runs or configurations of your software always always always take notes on each run. They don't have to be super formal but you should write down all the details you can think of. Some examples include, what were my settings, what did the cpu usage/machine stats look like. What problems did I run into. Always keep a date/time stamp on the tests. This will prevent the inevitable rerunning of tests because you forgot the results, or mixing up what you have tried and not tried. It only takes one mistake to use up more time than tons of note taking would require.
Friday, July 13, 2007
More Lies - Distributed Performance Testing Anti-patterns Part 3 of 4
In part 3 out of 4 of this blog, much like parts 1 and 2 I will hit on anti-patterns that allow your performance testing of clustered and or distributed software to lie to you. I'll be following up part 3 of this blog, the last 4 anti-patterns, with a blog about a simple distributed testing framework I have begun. Hopefully enough of you will be interested, try it and maybe even contribute to it.
Anti-pattern 7: In-memory vs. Distributed Performance Comparison
Description
Writing a test that compares the speed of adding objects to a local, in-memory data structure vs. adding objects to a clustered data structure.
Problem
To avoid suspense, I'll tell you the results of that test without running it. Adding things to a local, in-memory data structure stakes virtually no time at all. In-memory object changes happen so fast, they are hard to even measure. However, when you are making changes to a distributed data structure, no matter what, those state changes have to be shipped off to another location. This takes instructions to be executed to make this happen on top of the ones used for the original task. This isn't just slower, it is way slower. The comparison between in-memory object changes and distributed object changes is useless.
Solution
Figure out how much data you are going to be clustering and what the usage patterns of that data becoming clustered will be. Then simulate and time that. Once again, focus on total throughput with acceptable latency.
Anti-pattern 8: Ignore Real-world Cross-node Patterns
Description
Reading and writing the same data in every node.
Problem
Generally speaking, whether reading or writing, it is more expensive to access the same data concurrently across all nodes. Depending on the underlying clustering infrastructure, this can be more or less of a problem. If you are using an “everything everywhere” strategy, the performance hit of random access across all the data on all the nodes is less, but the “everything everywhere” sharing strategy generally does not scale well. Most other strategies perform better when data access is consistently read and or written from the same node
Solution
Write your performance tests in a way that allows you to set a percentage for locality of reference. Is an object accessed on the same node 80%, 90%, or 99% of the time? You should usually have some cross-node chatter, but usually not too much—although you should be as realistic to the problem you are trying to solve as possible.
Anti-pattern 9: Ignore Usage Patterns
Description
The performance test either just creates objects or just reads objects
Problem
In the real world, an application does a certain amount of reading, writing, and updating of shared objects. And those reads, writes, and updates are of certain sizes.
Solution
If your app likely changes only a few fields in a large object graph, then that is what your performance test should do. If your app is 90% read from multiple threads and 10% write from multiple threads than that is what your test should do. Make your test be true to what you need when it comes to data and usage.
Anti-pattern 10: Log Yourself to Death
Description
Last, but far from least, doing extra stuff like writing data out to a log chews up CPU. Logging too much in any performance test can render the test results meaningless.
This anti-pattern generally covers any extra CPU usage on a load-generating client that affects the performance test. In general, if one or more of your nodes is CPU bound in a cluster performance test, you likely have not maxed-out the performance of your cluster. Let me say that again, if you are resource constrained on any node, including your load generating nodes (but not including your server if one exists) then you are probably not maxing out what your cluster as a whole can handle. Investigate further.
Problem
If the individual load-generating nodes—or even the clustered nodes—are resource constrained, it is likely to create a false bottleneck in your test. You are trying to figure out the throughput of the cluster and your cluster nodes are likely busy doing other things like logging.
Solution
First, always have machine monitoring on all nodes in a performance test. Any time one of the nodes or load generators becomes resource constrained make sure you test with an additional node and see if it adds to the scale. If a node is unexpectedly resource constrained, then take a series of thread dumps (java only) and figure out where all the time is going.
Alright, that is the end of my anti-pattern list for now. I could probably come up with a few more but I'll save them for another day. The moral of this section of the blog is to be curious and skeptical with your testing results. Don't just ask what the numbers are. Find out why and you will end up a much happier person.
Anti-pattern 7: In-memory vs. Distributed Performance Comparison
Description
Writing a test that compares the speed of adding objects to a local, in-memory data structure vs. adding objects to a clustered data structure.
Problem
To avoid suspense, I'll tell you the results of that test without running it. Adding things to a local, in-memory data structure stakes virtually no time at all. In-memory object changes happen so fast, they are hard to even measure. However, when you are making changes to a distributed data structure, no matter what, those state changes have to be shipped off to another location. This takes instructions to be executed to make this happen on top of the ones used for the original task. This isn't just slower, it is way slower. The comparison between in-memory object changes and distributed object changes is useless.
Solution
Figure out how much data you are going to be clustering and what the usage patterns of that data becoming clustered will be. Then simulate and time that. Once again, focus on total throughput with acceptable latency.
Anti-pattern 8: Ignore Real-world Cross-node Patterns
Description
Reading and writing the same data in every node.
Problem
Generally speaking, whether reading or writing, it is more expensive to access the same data concurrently across all nodes. Depending on the underlying clustering infrastructure, this can be more or less of a problem. If you are using an “everything everywhere” strategy, the performance hit of random access across all the data on all the nodes is less, but the “everything everywhere” sharing strategy generally does not scale well. Most other strategies perform better when data access is consistently read and or written from the same node
Solution
Write your performance tests in a way that allows you to set a percentage for locality of reference. Is an object accessed on the same node 80%, 90%, or 99% of the time? You should usually have some cross-node chatter, but usually not too much—although you should be as realistic to the problem you are trying to solve as possible.
Anti-pattern 9: Ignore Usage Patterns
Description
The performance test either just creates objects or just reads objects
Problem
In the real world, an application does a certain amount of reading, writing, and updating of shared objects. And those reads, writes, and updates are of certain sizes.
Solution
If your app likely changes only a few fields in a large object graph, then that is what your performance test should do. If your app is 90% read from multiple threads and 10% write from multiple threads than that is what your test should do. Make your test be true to what you need when it comes to data and usage.
Anti-pattern 10: Log Yourself to Death
Description
Last, but far from least, doing extra stuff like writing data out to a log chews up CPU. Logging too much in any performance test can render the test results meaningless.
This anti-pattern generally covers any extra CPU usage on a load-generating client that affects the performance test. In general, if one or more of your nodes is CPU bound in a cluster performance test, you likely have not maxed-out the performance of your cluster. Let me say that again, if you are resource constrained on any node, including your load generating nodes (but not including your server if one exists) then you are probably not maxing out what your cluster as a whole can handle. Investigate further.
Problem
If the individual load-generating nodes—or even the clustered nodes—are resource constrained, it is likely to create a false bottleneck in your test. You are trying to figure out the throughput of the cluster and your cluster nodes are likely busy doing other things like logging.
Solution
First, always have machine monitoring on all nodes in a performance test. Any time one of the nodes or load generators becomes resource constrained make sure you test with an additional node and see if it adds to the scale. If a node is unexpectedly resource constrained, then take a series of thread dumps (java only) and figure out where all the time is going.
Alright, that is the end of my anti-pattern list for now. I could probably come up with a few more but I'll save them for another day. The moral of this section of the blog is to be curious and skeptical with your testing results. Don't just ask what the numbers are. Find out why and you will end up a much happier person.
Monday, July 2, 2007
Distributed Performance Testing Anti-patterns Part 2 of 4
In Part 1 of this 4 part blog I hit upon 3 Anti-Patterns that can make one's performance testing a poor representation of reality. Here I'm covering 3 more and will be following up with the last 4 in a few days. After that I'm going to talk about a simple distributed performance testing framework I'm going to give away to try and help people be more successful with this stuff.
Anti-pattern 4: Fake Data Fake Performance
Description:
Using data in a distributed performance test that looks nothing like your real data.
Problem:
Distributed computing solutions use all kinds of strategies to move data between nodes under the covers. Just representing a size of data to be shared ignores those strategies and in many cases misrepresents the performance of a real system with real data under real load, both positively and negatively. You may be testing specially optimized flattening tricks that make the system look faster than it is; likewise, you may be testing a particular case that doesn’t perform well, but that isn’t representative of the true performance of the system with real data.
Solution:
Make sure you test with object graphs that vary in size, type, and depth in similar ways to the data you plan to use in your application. Don't assume Maps of Strings will behave anything like the way real object data will behave.
Anti-pattern 5: Incoherent Cluster
Description:
Some clustering products are coherent, some are not, and some have both modes. Don't ignore whether you are testing the performance using the mode you really need for your application.
Problem:
While it is quite possible to have a coherent cluster that has the same throughput as an incoherent cluster, it is certainly harder to do. Coherently clustered software frameworks require the provider to do some fancy locking, batching, windowing, and coherent lazy-loading tricks that aren't for the faint of heart (in the internals of the clustering engine, that is, not for the application developer). You can't assume that performance between a coherent and incoherent clustering approach will be the same.
Solution:
Make sure that if what you need is coherently clustered data that you are actually testing that way. Also, if it’s coherence you’re after, it’s a good idea to verify the end-state of a performance test to make sure the system actually is coherent. Sort of post test verify phase.
Anti-pattern 6: The World by a Thread
Description:
Distributed tests that only use one thread per node.
Problem:
For most clustered software, the name of the game is throughput with acceptable latency. Pretty much all distributed computing software does batching and windowing to improve throughput in a multi-threaded environment. Maxing out a single thread will usually not even approach the max throughput of the JVM or the system as a whole in the same way that a single node will not.
Solution:
Make sure your test uses multiple threads for generating load in each JVM. Check to see if you are cpu bound on any node. If you are not cpu bound you might have a concurrency issue or just need to add more threads.
Conclusion:
I have 4 more anti-patterns that I'm going to publish next week. Keeping an eye on the full 10 will help greatly reduce mistakes in clustering and distributed computing. Once again I'll then be following up with a framework to help develop and run useful tests.
Anti-pattern 4: Fake Data Fake Performance
Description:
Using data in a distributed performance test that looks nothing like your real data.
Problem:
Distributed computing solutions use all kinds of strategies to move data between nodes under the covers. Just representing a size of data to be shared ignores those strategies and in many cases misrepresents the performance of a real system with real data under real load, both positively and negatively. You may be testing specially optimized flattening tricks that make the system look faster than it is; likewise, you may be testing a particular case that doesn’t perform well, but that isn’t representative of the true performance of the system with real data.
Solution:
Make sure you test with object graphs that vary in size, type, and depth in similar ways to the data you plan to use in your application. Don't assume Maps of Strings will behave anything like the way real object data will behave.
Anti-pattern 5: Incoherent Cluster
Description:
Some clustering products are coherent, some are not, and some have both modes. Don't ignore whether you are testing the performance using the mode you really need for your application.
Problem:
While it is quite possible to have a coherent cluster that has the same throughput as an incoherent cluster, it is certainly harder to do. Coherently clustered software frameworks require the provider to do some fancy locking, batching, windowing, and coherent lazy-loading tricks that aren't for the faint of heart (in the internals of the clustering engine, that is, not for the application developer). You can't assume that performance between a coherent and incoherent clustering approach will be the same.
Solution:
Make sure that if what you need is coherently clustered data that you are actually testing that way. Also, if it’s coherence you’re after, it’s a good idea to verify the end-state of a performance test to make sure the system actually is coherent. Sort of post test verify phase.
Anti-pattern 6: The World by a Thread
Description:
Distributed tests that only use one thread per node.
Problem:
For most clustered software, the name of the game is throughput with acceptable latency. Pretty much all distributed computing software does batching and windowing to improve throughput in a multi-threaded environment. Maxing out a single thread will usually not even approach the max throughput of the JVM or the system as a whole in the same way that a single node will not.
Solution:
Make sure your test uses multiple threads for generating load in each JVM. Check to see if you are cpu bound on any node. If you are not cpu bound you might have a concurrency issue or just need to add more threads.
Conclusion:
I have 4 more anti-patterns that I'm going to publish next week. Keeping an eye on the full 10 will help greatly reduce mistakes in clustering and distributed computing. Once again I'll then be following up with a framework to help develop and run useful tests.
Subscribe to:
Comments (Atom)
