ScaleAholic: ehcache

Showing posts with label ehcache. Show all posts

Tuesday, September 27, 2011

Spring Performance With Annotations and Ehcache ARC

The goal of this blog is to highlight Ehcache's new ARC feature using Spring Annotations.

Terracotta is releasing Ehcache 2.5. It's biggest new feature is ARC (Automatic Resource Control). ARC brings runtime performance tuning and caching to the Systems Administrator while simplifying it for the Developer. I've written a couple of blogs describing what it is and the value it brings:

These two blogs and the Ehcache docs are excellent sources of information on these highly requested/useful new capabilities.

Getting Started

I wanted to give a brief demonstration of how someone would leverage Ehcache ARC in practice so I pinged the guys who do Spring Annotations and they OK'd me to leverage one of their samples. Spring Annotations enable one to mark methods that do performance sensitive operations to be cached to improve the application's speed.

In order to follow along with me you'll want to read up on and check out the Spring Annotations sample here:

http://code.google.com/p/ehcache-spring-annotations/wiki/WeatherServiceExample

Download it and play with it a bit to get familiar with how it works.

This sample is fairly self contained. It show's how to use the various annotations to improve performance in a simple and concise way. What it doesn't talk about is the configuration of the underlying cache in use. This ends up being very challenging in practice. Without help you'll run into at least a few of these 5 problems:

Crash - Out of Memory Errors can crash your Application
Pause - Long GC's will pause your application leading to an unpredictable user experience
Wasted Space - In order to avoid the above two problems you over provision memory wasting ton's of space
Tuning Hell - Either you or your users spend ton's of time making arbitrary decisions trying to balance out the memory usage of your caches to avoid the above 3 issues
Poor Performance - Caches don't really help because they are incorrectly sized

Let's look at the ehcache.xml that ships with this sample:

The important parts of this config are from lines 30-31. It sets maxElementsInMemory to 100 and timeToLiveInSeconds to 300.

Why do these need to be set? Let's look at them individually:

maxElementsInMemory=100

When you started your Tomcat instance some amount of memory was reserved in the command line (Xms Xmx). If you didn't set a maxElementsInMemory your cache would grow infinitely until long GC's and then an OOME occur. So maxElementsInMemory is a resource management choice. But why 100? Is it because you know how big your entries are and you know 100 is exactly some percentage of the heap? More likely it's a guess. Probably a low ball guess so that you don't OOME wasting a whole bunch of heap. Now say you have 10's or 100's of caches that hold objects of varying size. How much harder is it to set the maxElementsInMemory now?

timeToLiveInSeconds = 300

This is a data freshness control. If your using an unclustered cache and multiple nodes then objects are likely being updated in a DB and the data in your cache can get out of date. How out of date? Depends what you set your timeToLiveInSeconds to.
NOTE: Keeping all nodes up to date is one of the reasons people use clustered caches.

Configuring a cache using the above controls makes an application sensitive to use case changes, changes in JVM heap settings, and number of caches. It also leaves the user of the application with a choice of leaving a lot of headroom to handle the worst case, risking OOME's and forcing the admin or user to tune at every deployment.

Ehcache ARC Gets Performance Without All The Tuning

These are the problems that Ehcache ARC, the most requested feature in Ehcache history, are designed to eliminate. Ehcache ARC makes your cache or caches controllable using the same metric the JVM's are started with. The same resource you are trying to manage. Bytes! In addition, instead of having to tune each individual cache you can instead make a high level choice at the CacheManager level and let ARC balance the resources for you.

In this change to the sample we have removed the cache entry count and instead allowed the cache to use 50% of the JVM heap (look at line 4 where it says maxBytesOnHeap). This setting tells the cache manager to allow for up to 50 percent of the heap for caching (Can also be configured using a fixed number of bytes instead of percentage). It will balance this heap across what ever caches are defined with this manager. This allow's an application to manage the available resources without waste and without GC pauses/OOMEs.

I updated and rebuilt my spring-annotations to use 2.5.0 and then rebuilt the sample using my newly created version of Ehcache Spring Annotations. You can grab the updated Jar here though I suspect the Spring Annotations guys will update the official build soon.

By using this approach as you add more and more @Cachable annotations to your application you don't have to change anything about the caches you create in ehcache.xml. They make use of the available cache resources as provided.

Wrapping Up

It has always been pretty easy to integrate a cache into an application. Things like Spring Annotations move the bar on that simplicity even further. The challenge was to make those caches effective. How does one get the most performance out of the resources available. Ehcache ARC was created to solve that problem for System Administrators, OEM's Developers and the entire OSS community. We are looking forward to people getting started with it and hope to receive lots of feedback.

Thursday, March 24, 2011

"What" "When" and "Where" ... Quartz Scheduler 2.0 Goes GA

What is Quartz?

Quartz is The Lightweight, Open Source, Enterprise-class Job Scheduler for Java. For years Quartz has put the "When" in Java applications via it's full featured scheduling capabilities. As a result, from Spring to JBoss, Quartz is embedded in just about every major Java product. Quartz is extremely robust and full featured providing things like HA, Transaction Support and all the precise guarantees one needs to assure reliable Job execution.

Terracotta has done a number of incremental improvements since taking over stewardship of Quartz. We've been evolving Quartz one step at a time while collecting what features the Quartz community were really interested in.

Quartz 2.0 is the realization of those user requested features!

"What" "When" and "Where"

Let's step back a bit and review where Quartz fits in the Terracotta world view. As most people know Terracotta is focused on adding Snap-in performance to JVM based applications through Scale-out (Terracotta Server Array), Scale-up (BigMemory) and Speed-up (Ehcache). We do this while maintaining our goals of simplicity, predictability, and density (can your datastore hold 2 billion entries per node in-memory?) throughout our product line. We view this data layer scaling layer as solving the "What" or the data problem of an application. But when going from 1 to many nodes that's just the first of two major scale points.

Up until now, Quartz Scheduler has been solely focused on the "When" part of code execution. Making sure "code execution" happens exactly "When" it's supposed. It has been pretty much ubiquitous in that space with literally 100s of thousands of users. While it Scaled-out with JDBC and Terracotta, it gave precious little control over where jobs execute.

With Quartz Scheduler 2.0 we've now added that "Where." This is about where code executes. Instead of randomly choosing a node for job execution in a scaled-out architecture you can now create constraints that help make the decision based on "Where" it should be executed. You can do this based on:

Resource Availability - CPU available, Memory Available, custom constraints
Ehcache's data locality - Bring the work to where the data is
Static allocation - Just decide where it goes

This gives Terracotta the ability to Snap-in and handle both the major scale-out points.

What's new in Quartz 2.0

The simple answer is A LOT!

Easy to use Fluent API - Quartz 2.0 has a new, easy to use fluent interface that hides the complexity of building out the description of your jobs behind a simple description of what you want to happen and when. I wrote a short blog about this when it was in beta.
Quartz "Where" - Constraint based system for controlling where jobs execute based on things like CPU and Memory usage, OS, and Ehcache data locality
Quartz Manager - A flash based GUI console for managing and monitoring your scheduler in production.
Batching - Helps improve a schedulers throughput by allowing one to make trade-offs between perfect time execution and benefiting from batching.
Ton's of bug fixes and features - Lots of long requested features. Check out the link for the list.

Quartz 2.0 Is Now GA

Quartz 2.0 makes a big leap in usability, power, visibility and management and it's NOW GA. We did this while maintaining Quartz's reliability and predictability. We focused on making sure existing users would have an easy time moving forward so check out the migration guide if you need help. It really is the next generation of Quartz. Give it a try and give us feedback as we are constantly working to make things better!

More Reading

Wednesday, June 9, 2010

A Couple Minutes With The Terracotta 3.3 Beta

With Terracotta platform version 3.3 our goal of creating an accessible application performance and scale-out solution is taking another big step forward. It's important to us that developers can find success using the ubiquitous Ehcache for performance, Hibernate for an ORM, Http Web Sessions for user state and Quartz for scheduling in a single node application. Then, without much thought or effort, add a couple lines of config to achieve scale-out and HA serving needs all the way through enterprise apps and into the cloud.

We have focused on a couple of high level areas in this release to take our solution to the next level.

Simple Scale - Reduce the need for tuning and tweaking with an improved next gen datastore. It will allow the everyday user to achieve the kinds of scale needed for massive applications both in data size and number of nodes.

Improved Visibility - We have added panels for Quartz and Sessions to our developer console giving full visibility to the full suite of performance and scale-out products. We have also added a more product focused organization of information in the tool.

Simple HA - A new panel that makes it easier to monitor interesting events that occur in a cluster. Pre-built templates for various configurations. Simplified way of migrating nodes, better defaults.

Modularity - We have exposed some of our most powerful pieces and parts as a versioned standard API that can be simply coded against to get things like, locking, queuing, maps and cluster topology. We use this API to build all four of our core products (Ehcache, Quartz, Hibernate 2nd level cache, Web Sessions).

While not everything made into the beta, enough is there to get a taste of where we are going. I encourage people to download it and try it as soon as possible. GA is just around the corner. Learn more on the release notes page.

Saturday, May 1, 2010

A Couple Minutes With Terracotta Toolkit Nightly

A couple days ago I wrote a short blog about our goal of creating a Terracotta Toolkit for others to easily build simple to use scaled out frameworks, tools and in some cases apps. This same exact toolkit is used to build Ehcache, Hibernate, Http Sessions and Quartz with Scale and HA just by adding 2 lines of config. I was pretty shocked and excited about what a tremendous reaction this blog and the toolkit received. Usually when I write a blog most of it's traffic comes from awesome sites like DZone but this blog was different. I actually got considerably more direct traffic from things like e-mails and twitter than anything conventional. That kind of viral excitement shows a much deeper level of curiosity and is tremendously motivating for the team here. Anyway, I digress.

Me being the curious type I decided to do something a little dangerous yesterday. I asked Geert, one of the leads on the Terracotta toolkit project, if we were in a state where I could play around with it a bit and get a sense of how hard it is to get started. We discussed the usual caveats, "We haven't agreed on some of the naming and factoring stuff yet?" which I clearly understood. Knowing that fact and that I'd be working off a nightly(this link downloads it) I set out to follow his simple instructions and see just how easy it was. Here is what I did.

Geert gave me a simple cyclic barrier sample to work off of. This is a sample I've whipped up my self a few times over the years. Many of our earliest users leveraged Terracotta strictly as a distributed lock manager and that's what this simple sample does as well. You start the app with a barrier name and a node count and it just hangs until the node count is reached. Seems like as good of a sample as any to highlight how to get started with the toolkit.

Downloaded the nightly and unpacked

tar -xzvf terracotta-trunk-nightly-rev15534.tar.gz

Grabbed a quick sample app

import org.terracotta.api.ClusteringToolkit;

import org.terracotta.api.TerracottaClient;

import org.terracotta.coordination.Barrier;

public class PlayingWithExpressBarrier {

public static void main(String[] args) {

final String barrierName = args[0];

final int numberOfParties = Integer.parseInt(args[1]);

//Start the Terracotta client

ClusteringToolkit clustering = new TerracottaClient(

"localhost:9510").getToolkit();

//Get an instance of a barrier by name

Barrier barrier = clustering.getBarrier(barrierName,

numberOfParties);

try {

System.out.println("Waiting ...");

int index = barrier.await();

System.out.println("... finished " + index);

} catch (Exception e) {

e.printStackTrace();

}

I worked in eclipse so at this point all I had to do is add the toolkit jar to the classpath to get it to compile

terracotta-trunk-nightly-rev15534/common/terracotta-toolkit-1.0-runtime-1.0.0-SNAPSHOT.jar

Now kickoff the Terracotta server

terracotta-trunk-nightly-rev15534/bin/start-tc-server.sh

And run the sample 3 time

java -cp .:terracotta-toolkit-1.0-runtime-1.0.0-SNAPSHOT.jar barrierName 3

The first two nodes should hang waiting for the 3rd node. Once the 3rd node is started all three should run to completion. This is just a little taste of what is coming.

The toolkit will have highly concurrent maps, locks, counters, queues, evictors and more so stay tuned. All of the toolkit classes will have tests in our TCK. The toolkit will run on any 1.5/1.6 JVM and requires no boot-jars, agents or container specific code. Given versions of the TCK will allow people to just drop in that toolkit jar at runtime for any app developed using that version of the API provided by the version of the TCK. I'll talk a bit more about how versioning the toolkit API will work in a future blog but it will work like any other spec. You can use any implementation that implements the spec. Enjoy, much more to come.

Monday, October 26, 2009

5 Hints You're Using A Map When You Should Be Using a Cache?

When developing software in Java one almost always ends up with a few maps that contain keyed data. Whether it's username -> conversational state, state code -> full state name, or cached data from a database. At what point do you move from a basic Map (or one of it's more fancy variants like LinkedHashMap subclasses or ConcurrentHashMap) to an open source, light weight cache like Ehcache?

Here are 5 quick things to look for:

5) You've built your own Map loader framework for bootstrapping and/or reads triggering loading

4) You need to be able to visualize and/or control your Map via JMX or a console. For example you want to watch the hit rate of your Map or track the size of your Map.

3) You're hacking in "overflow to disk" functionality and or persistence for your Map in order to handle memory pressure and/or restartability.

2) You're hacking in special behavior to cluster Maps when you scale out. This includes things like writing your own invalidation and/or replication solution over jms or RMI

1) You find yourself implementing your own eviction strategies.

Avoid The Slippery Slope:

It's a slippery slope. First you add in one feature, then another, and next thing you know you've reinvented the cache wheel. No point in doing that. There are great caches out there that are apache licensed, light weight and have the above features you need and more.

To learn more about Ehcache check it out at ehcache.org »