Concurrent Distributed Data Structures?
Many challenges exist when developing a high scale multi-node application. Our goal at Terracotta is to take on those challenges in ways that remove them from the plate of those architecting and developing applications and place them squarely on our shoulders.
In order to accomplish such a lofty goal we first had to create some core pieces of infrastructure on which many higher order abstractions could be built. One such "piece" is our ConcurrentDistributedMap. This data structure is a fundemental piece of our Distributed Cache, our Hibernate product and our Web Sessions product and is also available for use in custom solutions for those using Terracotta as a platform.
Challenges and Tradeoffs
Developing a data structure that is Distributed as well as Concurrent and Coherent has very different trade-offs from developing for a single JVM. If one took a standard concurrent data structure like ConcurrentHashMap and just clustered it "as is" one would likely run into performance and memory efficiency issues. Even a really cool concurrent data structure like Cliff Click's Non Blocking Hash Map would not do well if one used the algorithms without thought in a coherent cluster.
The challenge is that the trade-offs change when you add the latency of a network and data locality in the middle of the game. In normal concurrent data structures you care about:
- How long you hold locks
- How much is locked while you hold it.
- CPU usage
- Memory Usage and Object creation
In the clustered case you add the following:
Lock locality - Is the lock you need already held on the local machine or do you need to go get it over the network. If you need to go get it how long does that take. While a little of the question of "How long does it take to get the lock" exists on a multi-cpu single machine it's not nearly to the same degree.
Data locality - Is the data I need already local or do I need to go get it. If I need to get it how long does that take
Data change rate - How much clustered data am I changing and how long does it take to send it around? Also, do I send it around?
Data size - In a clustered world one often uses data structures that don't fit entirely in a single node. One has to take pains to control the size and amount of the data in each JVM for efficiency.
There are other implementation specific/point in time issues like number of locks and their cost but those can mostly be optimized away at the platform level.
Single JVM ConcurrentHashMap
ConcurrentHashMap adds concurrency by collecting groups of entries into segments. Those segments are grouped together both from a lock perspective, they share a lock, and from a physical space perspective, all entries in a segment are generally in one collection. In a single JVM the only risk of sharing a lock between the entries is that one can contend on the in-memory speed look-ups. This is a very effective way to handle large numbers of threads making highly contended gets and puts to the map. If one runs into contention with this kind of data structures one can just up the number of segments in the Map.
Concurrent Map In A Clustered World
In a clustered world problems occur with a data structure like this. First, getting a lock or an object can be either in-memory speed or take many times in-memory speed depending on whether it has recently been accessed locally. In some cases this is no problem and in some cases it's pretty bad. It's also a space issue. If a segment is brought in as a whole and it's entries are in that segment strictly because of it's hashCode then the natural partitioning of the app's usage won't help save space by only loading the entries needed locally. Instead it will load the needed objects and anything else in it's segments. This elimenates the benefits of any natural or forced locality that occurs in a multi-node application.
In order to highlight some of the pro's and con's of CHM (ConcurrentHashMap) I'm going to vet it against a few use-cases.
Use-case 1 - An 8 node app sharing a clustered ConcurrentHashMap
All the data in the map is read only and it's used in all nodes evenly and the data fits entirely in a single JVM's heap.
GOOD NEWS! you will be fine with a regular clustered ConcurrentHashMap. Lets look at why.
1) All data will be loaded everywhere so unnecessary faulting (the act of pulling a data item into a node) won't be happening
2) All locks will be read locks and will be local everywhere so your latency will be nice and low (Due to greedy locks)
3) Won't have contention on the segments because reads are pretty much concurrent
Use-case 2 - The same as use-case 1 but now the map data is bigger than memory and you have a sticky load balancer.
Some good and some bad:
1) Since data is batched into segments by hash code and your load balancer hashes on something completely different than your map hashes on you will end up loading data into each node that is not needed. This is a result of the ConcurrentHashMap segmenting strategy.
2) Locks will still be fine because it's all read and read locks are very concurrent so segment contention won't be an issue.
So the memory manager may be doing unnecessary work and whether you will be in trouble depends on how big the ConcurrentHashMap is
Use-case 3 - Same as use-case 2 with the exception that now we are doing 50 percent writes. Something similar to caching conversations.
1) Still have the above problem of loading unneeded batches
2) But now, due to the writes, you are also maintaining the state of the objects that have unnecessarily poor locality in all the nodes where they don't belong.
3) Now you have a locking problem. While writing an entry to a segment you are blocking people in other nodes from reading or writing to that segment adding some serious latency. Plus the locks are getting pulled around to different nodes because even though your load balancer provides locality it is on a different dimension that of the internals of the map and is therefore not helpful.
Reviewing the problems highlighted by use case 3:
- Lock hopping leading to slow lock retrieval
- Lock contention due to grouping of multiple unrelated entries with locks.
- Faulting and Memory wasting due to unfortunate segmenting of data
- Broadcasting of changes or invalidations to nodes that shouldn't care
What did we do?
We built a specialty highly concurrent map tuned for distribution and the above challenges call ConcurrentDistributedMap.
Instead of breaking things down into segments for locking we lock on the individual keys in the map. This gives the same correctness guarantees while giving the maximum concurrency. This drastically reduces lock hopping and contention and provides in-memory lock speeds most of the time.
The segments go away completely. Key Value pairs are managed on an individual basis so no unnecessary faulting occurs.
Broadcasting and invalidation:
The above, plus an efficient memory manager means that values are only faulted into nodes where they are used. Since those values aren't in all nodes anymore invalidation and or broadcasting of changes for those entries is no longer needed.
This data structure takes excellent advantage of any natural partitioning that may occur at the application level.
Building a fast, coherent, concurrent, distributed data-structure requires thinking about an extended set of concerns. However, if one pays attention to the issues it is possible to create a highly useful solution. To learn more check out the ConcurrentDistributedMap described above.
To learn more about the ConcurrentDistributedMap you can check out these links:
For more information on Terracotta's distributed data structures one can always look here: