Wednesday, December 16, 2009

Clustered Quartz Scheduler In 5 Minutes

Need a fast clustered/persistent Quartz Job Scheduler? Quartz Scheduler, the ubiquitous job scheduler built into Spring, JBoss and Grails can be configured to provide those features in under 5 minutes in this brief tutorial.

A Brief Digression Into The Why

Why do I need a persistent scaled out job scheduler? The main use cases for a clustered/persistent job scheduler are:
  • HA - You need to be able to restart your application without losing scheduled jobs
  • Scale Out - Your application now needs more than one node to handle the load it receives and you want your scheduled jobs to distribute across nodes.
  • You are using a database to persist and or scale your scheduler and you are seeing DB load/locking issues and or you find it two difficult to setup or maintain.

Steps:

1) Download Terracotta 3.2 (Which includes Quartz Scheduler) http://www.terracotta.org/dl/oss-download-catalog

2) Put the following jars in your class path (all included in the quartz-1.7.0 directory of the Terracotta kit from above):

quartz-1.7.0.jar - regular quartz jar
quartz-terracotta-1.0.0.jar - Terracotta clustered store

3) Whip up some scheduler code:
 public class QuartzSample {

public void startJobs() throws Exception {
Properties props = new Properties();
props.load(QuartzSample.class.getClassLoader().getResourceAsStream("org/quartz/quartz.properties"));


// **** Begin Required TC props
props.setProperty(StdSchedulerFactory.PROP_JOB_STORE_CLASS,"org.terracotta.quartz.TerracottaJobStore");
props.setProperty("org.quartz.jobStore.tcConfigUrl", "localhost:9510");
// *** End Required Terrocotta Stuff


StdSchedulerFactory factory = new StdSchedulerFactory(props);
Scheduler scheduler = factory.getScheduler();
scheduler.start();
if (scheduler.getJobDetail("myJob", "myGroup") == null) {
System.out.println("Scheduling Job!");
JobDetail jobDetail = new JobDetail("myJob", "myGroup", DumbJob.class);
Trigger trigger = TriggerUtils.makeSecondlyTrigger(5);
trigger.setName("myTrigger");
scheduler.scheduleJob(jobDetail, trigger);
} else {
System.out.println("Job Already Scheduled!");
}
}

public static class DumbJob implements Job {

@Override
public void execute(JobExecutionContext arg0) throws JobExecutionException {
System.out.println("Works baby");
}

}

public static void main(String[] args) throws Exception {
new QuartzSample().startJobs();
}
** NOTE: Notice the two lines of properties that set things up for clustering with Terracotta in the sample. That's the only difference from single node unclustered Quartz.

4) Start the Terracotta server by running start

./start-tc-server.sh

in the bin directory of the Terracotta kit

5) Run the sample code above and watch it run the job every 5 seconds. Then kill the sample app and restart it. The app will tell you that the job is already scheduled and the job will continue.

Conclusion

Two lines of configuration and a server takes you from ubiquitous job scheduler built into Spring, JBoss and Grails to scale out and persistence.

Have fun!

Friday, November 6, 2009

Welcome James House and The Quartz Community

I'm excited to be welcoming James House and Quartz to the Terracotta and Ehcache Fold. The Terracotta dev and field teams have long believed that scheduling and coordination are hugely important parts of applications from single node to scaled out architecture. In Java that leads you to one place. Quartz! We believe James and Quartz are an excellent fit with our community and suite of products that are useful from single node to the cloud.

Quartz Scheduler is both a best of bread and a ubiquitous product. We are getting right to work on contributing. The first step along the path is the Mavenization and Hudsonifaction of the Quartz project which has already been completed! We also are ready with a beta version of Quartz Terracotta Express edition. It's an HA/Durability/scale-out version of Quartz that requires no DB and is so simple it can be leveraged in minutes by existing Quartz users. We look forward to working with James to create the most usable and useful enterprise class open source scheduler available and the enterprise class support product to go with it. We have lots of great feature ideas and we look forward to the journey ahead.


Monday, October 26, 2009

5 Hints You're Using A Map When You Should Be Using a Cache?

When developing software in Java one almost always ends up with a few maps that contain keyed data. Whether it's username -> conversational state, state code -> full state name, or cached data from a database. At what point do you move from a basic Map (or one of it's more fancy variants like LinkedHashMap subclasses or ConcurrentHashMap) to an open source, light weight cache like Ehcache?

Here are 5 quick things to look for:

5) You've built your own Map loader framework for bootstrapping and/or reads triggering loading
4) You need to be able to visualize and/or control your Map via JMX or a console. For example you want to watch the hit rate of your Map or track the size of your Map.

3) You're hacking in "overflow to disk" functionality and or persistence for your Map in order to handle memory pressure and/or restartability.

2) You're hacking in special behavior to cluster Maps when you scale out. This includes things like writing your own invalidation and/or replication solution over jms or RMI

1) You find yourself implementing your own eviction strategies.


Avoid The Slippery Slope:

It's a slippery slope. First you add in one feature, then another, and next thing you know you've reinvented the cache wheel. No point in doing that. There are great caches out there that are apache licensed, light weight and have the above features you need and more.


To learn more about Ehcache check it out at ehcache.org »

Friday, October 23, 2009

Excellent Blog On Code Smells...

This is a really good/short blog that highlights some code smells that everyone should look out for. It's not a complete diary or anything but when I read it I felt like it could have been written by me.


He also has a follow on which I agree with.


Improved Web Browsing By Controlling Flash

I have a few pet annoyances when surfing the web.

* I find it disruptive to have audio play when I hit a web page (though I usually keep my sound off)
* I don't like when video plays automatically when I hit a web page.
* I don't like when my computer heats up and the battery drains when I'm not doing anything just because
I left a web page/tab open.
* Some pages that look rather slim take a disproportionately long time to load (there are lots of reasons for this but flash seems to be one of them)

Turns out that I was able to mostly solve those problems by using one of the many flash control
plugins. I surf on Safari for the most part so I went with ClickToFlash. FireFox has FlashBlock which I haven't tried.

The way it works is it shows you frames where flash usually should be. If you want to see what is there then just click on the box and it loads the real flash. It has many other nice features around content but the important one is the one I described.

I have to say, when I installed this thing I was absolutely amazed by how many things that looked like regular adds and images were actually flash. You will be astounded. Gotta wonder what these companies are doing with flash when they are showing a static image? I didn't take any official benchmarks but after installing I noticed an increase in battery life and decrease in heat on my computer. This experience makes me actually believe (didn't really at first) Apple's battery/cpu excuse for not supporting flash on the iPhone.

If your like me and only want flash when you want flash. Try it out.


Friday, October 2, 2009

Distributed Coherent EhCache In less than 5 Minutes...

Need a fast clustered/persistent cache? Ehcache, the ubiquitous cache built into Spring, JBoss and Grails can be configured to provide those features in under 5 minutes using this brief tutorial.

A Brief Digression Into The Why

Why do I need a persistent scaled out cache? The main use cases for a clustered/persistent cache are:
    • I'm using hibernate and it's pounding the database or it's too slow. Use a coherent second level cache to deflect load off the database, reduce latency without getting stale data.
    • Have a bunch of intermediate data that doesn't belong in the database and/or is expensive to store in the database that I want to keep in memory. Problem is if a node goes down or if someone asks for the data from another node the data is lost
    • I'm already caching but I have to load data over and over again into every node even though hot data for one node is hot for all (Known as the 1/n effect). If the data is cached for one node it is cached for all.

    Steps:

    1) Download the latest Ehcache www.ehcache.org

    2) Put the following jars in your class path (all included in the ehcache kit):
    ehcache-core.jar - Core Ehcache
    ehcache-terracotta.jar - Terracotta clustering
    slf4j-api-1.5.8.jar - Logging API Used by Ehcache
    slf4j-jdk14-1.5.8.jar - Implementation of the Logging API

    3) whip up some cache code:

     package org.sharrissf.samples;

    import net.sf.ehcache.Cache;
    import net.sf.ehcache.CacheManager;
    import net.sf.ehcache.Element;

    public class MyFirstEhcacheSample {
    CacheManager cacheManager = new CacheManager("src/ehcache.xml");

    public MyFirstEhcacheSample() {
    Cache cache = cacheManager.getCache("testCache");
    int cacheSize = cache.getKeys().size();
    cache.put(new Element("" + cacheSize, cacheSize));
    for (Object key : cache.getKeys()) {
    System.out.println("Key:" + key);
    }
    }

    public static void main(String[] args) throws Exception {
    new MyFirstEhcacheSample();
    }
    }



    4) Whip up some quick config

     <?xml version="1.0" encoding="UTF-8"?>

    <ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="ehcache.xsd">

    <terracottaConfig url="localhost:9510" />

    <defaultCache />

    <cache name="testCache" eternal="true">
    <terracotta clustered="true"/>
    </cache>

    </ehcache>



    5) Download Terracotta

    6) Start the terracotta server in the bin directory with the start ./start-tc-server.sh

    Now just run that Java snippet a few times and see your cache grow.

    Tuesday, August 18, 2009

    Welcome EHCache Community

    I'm excited to be welcoming Greg Luck and the EHCache community to the Terracotta family. EHCache is an extremely useful/usable product and nearly ubiquitous in the caching space. Greg has spent years solving the important real world problems associated with building highly performant applications. The Terracotta Dev team is very much looking forward to helping accelerate EHCache's development as well as provide the best possible integration with the Terracotta product family.

    EHCache will remain under the Apache 2 license and we have created the beginnings of a new website at www.ehcache.org. Greg will continue to drive EHCache's vision and direction, as well as being highly involved in it's development. He will also be instrumental in helping Terracotta to define and build out our caching strategy as a whole. His vision, as well as the EHCache community's help are essential in allowing us to together take these products to the next level.

    We see a great future of product offerings for your desktop app, on your servers and in your cloud solving the scale/performance problems of today, tomorrow and beyond.

    Wednesday, August 12, 2009

    Distributed Data Structures: ConcurrentDistributedMap

    Concurrent Distributed Data Structures?

    Many challenges exist when developing a high scale multi-node application. Our goal at Terracotta is to take on those challenges in ways that remove them from the plate of those architecting and developing applications and place them squarely on our shoulders.

    In order to accomplish such a lofty goal we first had to create some core pieces of infrastructure on which many higher order abstractions could be built. One such "piece" is our ConcurrentDistributedMap. This data structure is a fundemental piece of our Distributed Cache, our Hibernate product and our Web Sessions product and is also available for use in custom solutions for those using Terracotta as a platform.


    Challenges and Tradeoffs

    Developing a data structure that is Distributed as well as Concurrent and Coherent has very different trade-offs from developing for a single JVM. If one took a standard concurrent data structure like ConcurrentHashMap and just clustered it "as is" one would likely run into performance and memory efficiency issues. Even a really cool concurrent data structure like Cliff Click's Non Blocking Hash Map would not do well if one used the algorithms without thought in a coherent cluster.

    The challenge is that the trade-offs change when you add the latency of a network and data locality in the middle of the game. In normal concurrent data structures you care about:

    - How long you hold locks
    - How much is locked while you hold it.
    - CPU usage
    - Memory Usage and Object creation

    In the clustered case you add the following:

    Lock locality - Is the lock you need already held on the local machine or do you need to go get it over the network. If you need to go get it how long does that take. While a little of the question of "How long does it take to get the lock" exists on a multi-cpu single machine it's not nearly to the same degree.

    Data locality - Is the data I need already local or do I need to go get it. If I need to get it how long does that take

    Data change rate - How much clustered data am I changing and how long does it take to send it around? Also, do I send it around?

    Data size - In a clustered world one often uses data structures that don't fit entirely in a single node. One has to take pains to control the size and amount of the data in each JVM for efficiency.

    There are other implementation specific/point in time issues like number of locks and their cost but those can mostly be optimized away at the platform level.


    Single JVM ConcurrentHashMap

    ConcurrentHashMap adds concurrency by collecting groups of entries into segments. Those segments are grouped together both from a lock perspective, they share a lock, and from a physical space perspective, all entries in a segment are generally in one collection. In a single JVM the only risk of sharing a lock between the entries is that one can contend on the in-memory speed look-ups. This is a very effective way to handle large numbers of threads making highly contended gets and puts to the map. If one runs into contention with this kind of data structures one can just up the number of segments in the Map.


    Concurrent Map In A Clustered World

    In a clustered world problems occur with a data structure like this. First, getting a lock or an object can be either in-memory speed or take many times in-memory speed depending on whether it has recently been accessed locally. In some cases this is no problem and in some cases it's pretty bad. It's also a space issue. If a segment is brought in as a whole and it's entries are in that segment strictly because of it's hashCode then the natural partitioning of the app's usage won't help save space by only loading the entries needed locally. Instead it will load the needed objects and anything else in it's segments. This elimenates the benefits of any natural or forced locality that occurs in a multi-node application.


    Use-Case Analysis

    In order to highlight some of the pro's and con's of CHM (ConcurrentHashMap) I'm going to vet it against a few use-cases.

    Use-case 1 - An 8 node app sharing a clustered ConcurrentHashMap

    All the data in the map is read only and it's used in all nodes evenly and the data fits entirely in a single JVM's heap.

    GOOD NEWS! you will be fine with a regular clustered ConcurrentHashMap. Lets look at why.

    1) All data will be loaded everywhere so unnecessary faulting (the act of pulling a data item into a node) won't be happening
    2) All locks will be read locks and will be local everywhere so your latency will be nice and low (Due to greedy locks)
    3) Won't have contention on the segments because reads are pretty much concurrent

    Use-case 2 - The same as use-case 1 but now the map data is bigger than memory and you have a sticky load balancer.

    Some good and some bad:

    1) Since data is batched into segments by hash code and your load balancer hashes on something completely different than your map hashes on you will end up loading data into each node that is not needed. This is a result of the ConcurrentHashMap segmenting strategy.

    2) Locks will still be fine because it's all read and read locks are very concurrent so segment contention won't be an issue.

    So the memory manager may be doing unnecessary work and whether you will be in trouble depends on how big the ConcurrentHashMap is

    Use-case 3 - Same as use-case 2 with the exception that now we are doing 50 percent writes. Something similar to caching conversations.

    1) Still have the above problem of loading unneeded batches
    2) But now, due to the writes, you are also maintaining the state of the objects that have unnecessarily poor locality in all the nodes where they don't belong.
    3) Now you have a locking problem. While writing an entry to a segment you are blocking people in other nodes from reading or writing to that segment adding some serious latency. Plus the locks are getting pulled around to different nodes because even though your load balancer provides locality it is on a different dimension that of the internals of the map and is therefore not helpful.

    Reviewing the problems highlighted by use case 3:

    - Lock hopping leading to slow lock retrieval
    - Lock contention due to grouping of multiple unrelated entries with locks.
    - Faulting and Memory wasting due to unfortunate segmenting of data
    - Broadcasting of changes or invalidations to nodes that shouldn't care


    What did we do?

    We built a specialty highly concurrent map tuned for distribution and the above challenges call ConcurrentDistributedMap.


    Locking:
    Instead of breaking things down into segments for locking we lock on the individual keys in the map. This gives the same correctness guarantees while giving the maximum concurrency. This drastically reduces lock hopping and contention and provides in-memory lock speeds most of the time.


    Segmenting:
    The segments go away completely. Key Value pairs are managed on an individual basis so no unnecessary faulting occurs.


    Broadcasting and invalidation:
    The above, plus an efficient memory manager means that values are only faulted into nodes where they are used. Since those values aren't in all nodes anymore invalidation and or broadcasting of changes for those entries is no longer needed.

    This data structure takes excellent advantage of any natural partitioning that may occur at the application level.


    Summary

    Building a fast, coherent, concurrent, distributed data-structure requires thinking about an extended set of concerns. However, if one pays attention to the issues it is possible to create a highly useful solution. To learn more check out the ConcurrentDistributedMap described above.


    Additional Reading:




    For more information on Terracotta's distributed data structures one can always look here:


    Friday, February 6, 2009

    Maven Is Cooler than you think...

    I'm sure I'm not the only one who has heard people curse Maven. But Maven is cooler than you think. Back in the day when I wanted to start a project I always had to get a whole bunch of gunk setup before I even wrote a line of code. Especially when trying a new framework or tool. Today I was whipping up a new project for a simple micro-benchmark on some Terracotta stuff and it reminded me why Maven really can be quite awesome. It took me 10 minutes and about 7 steps. The next time around I won't need to do the installs and then it's 4 steps.

    These were the steps I took to get started:

    1) Install Maven 

    2) Used the Pojo Archetype to create the build and test environment for my project.
    - Creates a Mavenized directory structure ready for build, test, run etc. Hooks up to Terracotta maven plugin as well.
    - make sure you replace the group id and project id in the command line.

    updated - with the latest eclipse plugin this is unnecessary
    X 3) In my new project directory type: "mvn eclispe:m2eclipse"
    - This takes your Maven project and readies it for eclipse

    4) Install the Maven Eclipse Plugin (I already had eclipse installed)
    - Makes dealing with Maven from eclipse much easier

    - Makes dealing with Terracotta from eclipse much easier

    6) File-> Import-> Maven projects and import your project into eclipse
    - Loads up the project directory created from the archetype into Eclipse
    7) Select the project and hit Terracotta->Add Terracotta Nature

    What you end up with here is a complete project setup ready to be built and tested from both Eclipse and the command line using Maven.

    Literally took me about 10 minutes to get started. Notice what you didn't have to do.

    1) Didn't have to build a pom.xml or other kind of build file
    2) Didn't have to download or install Terracotta or any of it's pieces
    3) Didn't have to think about your directory structures, where you want to put your tests, how you want to run those tests
    4) Didn't have to figure out how to do all this stuff in Eclipse or the commandline

    Sure, Maven can be challenging at times, but in cases like this, when the vendors have things setup for you, it can be a huge time saver.

    update:
    Looks like we've reduced the number of steps to 6 the first time and 3 after that. If we take the guy's idea about auto-applying the Terracotta Nature in archetype we could reduce it to 5 and 2.