Scaling software should be an activity done with ease by Hardware, Cloud and System Administrators without knowledge of the applications being scaled.
Wednesday, December 15, 2010
Ehcache To The Rescue (Comic Strip)
Monday, November 29, 2010
Quartz Scheduler 2.0 Beta 1 Welcomes New Fluent API and "Where"
- Simplify/modernize the Quartz API.
- Improve the Quartz experience when leveraging a cluster
- The date/time related methods have been moved off of the Trigger and Job classes into a Date building class called "DateBuilder"
- We've removed the need to know details about which Job and Trigger classes you need and instead infer them through the building methods you call.
- The construction now reads more like a sentence. new job withIdentity "job1", "group1". new trigger withIdentity "trigger1", "group1" start at runTime
Ehcache 2.4 Beta 1 Welcomes Search, Local Transactions and more...
- A bit of annoying coding
- Only practical for unclustered caches
- Transactions without a JTA transaction manager
- More speed
- NonStopCache now built in. Rather than have to add a jar and configure a wrapper to get the non-stop characteristics in clustered land this is now built into the product core and be turned on via configuration
- Search now works clustered - The new search API is backed by the Terracotta tier. This is still early and we have a lot of performance and HA work to do here. That said, it is testable and usable so give it a try.
- Explicit locking module is now in the core kit
- Rejoin now works in non-stop (You can disconnect from a cluster and reconnect to that cluster without restarting)
Monday, November 15, 2010
Direct Buffer Access Is Slow, Really?
Type: ONHEAP Took: 8978 to write and read: 10737418368
Type: DIRECT Took: 9223 to write and read: 10737418368
Type: ONHEAP Took: 8827 to write and read: 10737418368
Type: DIRECT Took: 9283 to write and read: 10737418368
Type: ONHEAP Took: 8813 to write and read: 10737418368
Type: DIRECT Took: 9604 to write and read: 10737418368
Friday, November 5, 2010
A Couple Minutes With Ehcache Search...
Wednesday, October 6, 2010
A Couple Minutes With Ehcache BigMemory Pounder...
- Get the Ehcache with BigMemory Beta and a license key to use it.
- Get the Standalone Ehcache Pounder distribution
- Unpack the Ehcache with BigMemory distribution
- Copy the Standalone Ehcache Pounder kit into the ehcache kit and unpack it
- Copy your license file and your ehcache core jar into the pounder kit
Wednesday, September 15, 2010
A Little Bit About BigMemory for Ehcache and Terracotta ...
In talking to our users it is clear that applications are getting more and more data hungry. According to IDC, data requirements are growing at an annual rate of 60 percent. This trend is driven further by cloud computing platforms, company consolidation and huge application platforms like Facebook. There is good news though. Server class machines purchased this year have a minimum of 8 Gig of RAM and likely have 32 Gig. Cisco is now selling mainstream UCS boxes with over 380 Gig of RAM (which I have tried and is amazing). On EC2 you can borrow 68.4 Gig machines for 2 dollars an hour (I have also tried this and it is also pretty amazing). Memory has gotten big and extremely cheap compared to things like developer time and user satisfaction.
Unfortunately a problem exists as well. For Java/JVM applications it is becoming an ever increasing challenge to use all that data and memory. At the same time that the data / memory explosion is occurring the amount of heap a Java process can effectively use has stayed largely unchanged. This is due to the ever increasing Garbage Collection pauses that occur as a Java heap gets large. We see this issue at our customers but we also see here at Terracotta tuning our products and the products we use like third party app servers, bug tracking systems CMS's and the like. How many times have you heard "run lots of JVM's" or "don't grow the heap" from your vendor's and/or devs?
So we set out to first identify the problem as it exists today, both in the wild and in-house. We then created a solution, first for us (an internal customer) and then for all of the millions of nodes of Ehcache out there (all of you)
3 Big Problems Seen by Java Applications
My Application is too slow
My application can't keep up with my users. I've got 10's of gigs of data in my database but it's over loaded and or too slow to service my needs. Either due to the complicated nature of my queriers or the volume of those queries. I want my data closer to the application so of course I start caching. Caching helps, but I want to cache more. My machine has 16 gigs of RAM but if I grow my heap that large, I get too many Java GC pauses.
My Application's latencies aren't predictable
On average my Java application is plenty fast but I see pauses that are unacceptable to my users. I can't meet my SLA's due to the size of my heap combined with Java GC pauses.
My software/deployment is too complicated
I've solved the Java GC problem. I run with many JVM's with heap sizes of 1-2 gigs. I partition my data and or loadbalance to get the performance and availability I need but my setup is complicated to manage because I need so many JVM's and I need to make sure the right data is in the right places. I fill up all 64 Gig's of RAM on my machine but it's too hard and fragile.
The other problem
Like many vendors, in the past we told our users to keep the heaps down under 6 gig. This forced our customers to not completely leverage the memory and or cpu on the machines they purchased and or stack JVM's on a machine. The prior is expensive and inefficient and the latter fragile and complex.
Here is a quick picture of what people do with their Java Applications today:
Base Case - Small heap JVM on a big machine because GC pauses are a problem
Big heap - That has long GC's that are complicated to manage
Stacked small JVM heaps - This in combination with various sharding, load balancing and clustering techniques is often used. This is complicated to manage and if all the nodes GC at the same time this can lead to availability problems.
What kind of solution would help?
Here's what we believe are the requirements for a stand-alone caching solution that attacks the above problems.
- Hold a large dataset in memory without impacting GC (10s-100s of Gig) - The more data that is cached the less you have to go to your external data source and or disk the faster the app goes
- Be Fast - needs to meet the SLA
- Stay Fast - Don't fragment, don't slowdown as the data is changed over time
- Concurrent - Scales with cpu and threads. No lock contention
- Predictable - can't have pauses if I want to make my SLA
- Needs to be 100 percent Java, work on your JVM on your OS
- Restartable - A big cache like this needs to be restartable because it takes too long to build
- Should just Snap-in and work - not a lot of complexity
What have we built?
First we built a core piece of technology, BigMemory, an off-heap, direct memory buffer store, with a highly optimized memory manager that meets and or exceeds requirements 1-6 above. This piece of technology is currently being applied in two ways:
1) Terracotta Server Array - We sold it to our built-in customer, the Terracotta Server Team, who can now create individual nodes of our L2 caches that can hold a hundred million entries, leverage 10's of gigs of memory, pause free and with linear TPS. This leverages entire machines (even big ones) with a single JVM for higher availability, a simpler deployment model, 8x improved density and rock steady latencies.
2) Ehcache - We've added BigMemory and a new disk store to Enterprise Ehcache to create a new tiered store adding in requirements 7-8 from above (snap-in simplicity and restart-ability). The Ehcache world at large can benefit from this store just as much as the Terracotta products do.
Check out the diagram below.
Typically, using either of the BigMemory backed products, you shrink your heap and grow your cache. By doing so SLA's are easier to meet because GC pauses pretty much go away and you are able to keep a huge chunk of data in memory.
Summing up
Memory is cheap and growing. Data is important and growing just as fast. Java's GC pauses are preventing applications from keeping up with your hardware. So do what every other layer of your software and hardware stack does: cache. But in Java, the large heaps needed to hold your cache can hurt performance due to GC pauses. So use a tiered cache with BigMemory that leverages your whole machine and keeps your data as close to where it is needed as possible. That's what Terracotta is doing for it's products. Do so simply, i.e. snap it in to Ehcache and have large caches without the pauses caused by GC. As a result create a simpler architecture with improved performance/density and better SLA's.
Learn more at http://terracotta.org/bigmemory
Check the Ehcache BigMemory docs
Wednesday, July 28, 2010
Application Scale and Quartz "Where"
Thursday, July 22, 2010
Hiring at Terracotta - Performance/Testing Engineer and or Lead
Tuesday, July 20, 2010
Hiring at Terracotta...
- Works hard and solves hard problems
- Strong OO/Framework design sense
- Understand the Java Landscape in a deep way (i.e. Spring, J2EE, Ehcache, Quartz, Rest, SOAP, NoSQL)
- Excited about performance, caching and scale-out
- Love to code and write tests
- Works well in a team and individually
- Believes that the only way to know if something works is to test it in a repeatable way
- Experience with open-source
- Live in/near San Francisco
- Tech Lead experience
- Design and build the next generation of scale-out, performance and HA software.
- Contribute to and extend Terracotta, Ehcache and/or Quartz Scheduler, some of the most popular and widely used frameworks in Java
Thursday, June 24, 2010
A Couple Minutes With Some Toolkit Samples
- Downloading and unpacking the nightly build or beta from Terracotta
- Staring the Terracotta server by calling the ./bin/start-tc-server.sh
- Running 2 instances of the compiled versions of any of the above
Wednesday, June 9, 2010
A Couple Minutes With The Terracotta 3.3 Beta
- Simple Scale - Reduce the need for tuning and tweaking with an improved next gen datastore. It will allow the everyday user to achieve the kinds of scale needed for massive applications both in data size and number of nodes.
- Improved Visibility - We have added panels for Quartz and Sessions to our developer console giving full visibility to the full suite of performance and scale-out products. We have also added a more product focused organization of information in the tool.
- Simple HA - A new panel that makes it easier to monitor interesting events that occur in a cluster. Pre-built templates for various configurations. Simplified way of migrating nodes, better defaults.
- Modularity - We have exposed some of our most powerful pieces and parts as a versioned standard API that can be simply coded against to get things like, locking, queuing, maps and cluster topology. We use this API to build all four of our core products (Ehcache, Quartz, Hibernate 2nd level cache, Web Sessions).
Tuesday, May 18, 2010
Steve Jobs Stanford Commencement
Friday, May 7, 2010
A Couple Minutes With Non-Stop Ehcache
- Download Ehcache 2.1
- Download the NonStopCache 1.0
- Start the Terracotta server
- run the below program (source code below)
Regular cache. No Decorator
The size of the cache is: 0
After put the size is: 1
Here are the keys:
Key:0
Done with cache.
Sleeping, Stop your server
Disconnected NonStop with noop cache.
The size of the cache is: 0
After put the size is: 0
Here are the keys:
Done with cache.
Disconnected NonStop with local reads cache.
The size of the cache is: 1
After put the size is: 1
Here are the keys:
Key:0
Done with cache.
Disconnected NonStop with exception cache.
Exception in thread "main" net.sf.ehcache.constructs.nonstop.NonStopCacheException: getKeys timed out
at net.sf.ehcache.constructs.nonstop.behavior.ExceptionOnTimeoutBehavior.getKeys(ExceptionOnTimeoutBehavior.java:114)
at net.sf.ehcache.constructs.nonstop.behavior.ClusterOfflineBehavior.getKeys(ClusterOfflineBehavior.java:120)
at net.sf.ehcache.constructs.nonstop.NonStopCache.getKeys(NonStopCache.java:264)
at MyFirstNonStopEhcacheSample.addToCacheAndPrint(MyFirstNonStopEhcacheSample.java:45)
at MyFirstNonStopEhcacheSample.(MyFirstNonStopEhcacheSample.java:40)
at MyFirstNonStopEhcacheSample.main(MyFirstNonStopEhcacheSample.java:60)
What Just Happened?
The cache is first loaded into an ordinary undecorated cache. This is performed before the server kill and proceeds without incident. The next round of operations on the cache were performed with the server down.
- These decorators are all being used on the same cache. This way you can make the behavior specific to the user of the cache. It gives tremendous flexibility.
- You'll notice this little sample flies through despite the timeout being set to 13 seconds. This is because it's in fail fast mode. In this configurable mode if the cache knows it can't communicate it will return the failure case immediately. If that's not what you want you can instead set it up to not fail fast and wait the full timeout no matter what.
- I did this work in config but the same setup can be done in code
And the Config file ehcachenonstop.xml:
Saturday, May 1, 2010
A Couple Minutes With Terracotta Toolkit Nightly
- Downloaded the nightly and unpacked
- Grabbed a quick sample app
import org.terracotta.api.ClusteringToolkit;
import org.terracotta.api.TerracottaClient;
import org.terracotta.coordination.Barrier;
public class PlayingWithExpressBarrier {
public static void main(String[] args) {
final String barrierName = args[0];
final int numberOfParties = Integer.parseInt(args[1]);
//Start the Terracotta client
ClusteringToolkit clustering = new TerracottaClient(
"localhost:9510").getToolkit();
//Get an instance of a barrier by name
Barrier barrier = clustering.getBarrier(barrierName,
numberOfParties);
try {
System.out.println("Waiting ...");
int index = barrier.await();
System.out.println("... finished " + index);
} catch (Exception e) {
e.printStackTrace();
}
}
}
- I worked in eclipse so at this point all I had to do is add the toolkit jar to the classpath to get it to compile
- Now kickoff the Terracotta server
- And run the sample 3 time
Thursday, April 29, 2010
Countdown To The Terracotta Toolkit Beta
- Ease of use is paramount - For both the developers that leverage the Terracotta toolkit and the people who use the stuff built using the Terracotta toolkit
- Stable API matters - We are building a compatibility kit and will maintain a strict and clear versioning scheme so that framework developers can rely on and clearly know what versions of Terracotta can work with the API version used in the application. Your users can just drop in any version of the terracotta-toolkit.jar that implements the version of the API you coded against.
- Parts is Parts - Get all the useful parts we use to build our products packaged and out for others to use.
- Scale Continuum - The parts should work both clustered and unclustered continuing our vision of a scale continuum.
Monday, April 26, 2010
Dave Klein's Scale Grails Webinar
Thursday, April 22, 2010
<terracotta clustered="true"/>
Tuesday, April 20, 2010
Ehcache 2.1 Beta - Lots of Stuff, Still Backward Compatible
- Build on our vision of an application scale continuum from one node to the cloud.
- Improve Ehcache performance both unclustered and clustered.
- Improve Ehcache applicability for both unclustered and clustered.
- The Explicit Locking Module - This module allows you to acquire and release locks manually for given keys. It required some significant rework in the unclustered stores but this now works just as well unclustered as it does clustered supporting fully coherent operations.
- JTA - In 2.0 of Ehcache we added JTA support when clustered via Terracotta. In 2.1 we extended that functionality to unclustered and have begun the process of performance tuning to go along with its XA compliance.
- JTA for Hibernate Second Level Cache - We added support for using Ehcache JTA in a second level cache both clustered and unclustered.
- UnlockedReadsView - This is a subtle but important feature. For those who are using a coherent cache but have some part of an application that needs to be able to read at high rates without impacting the rest of the cache this view is a huge help.
- NonStopCache - Useful for guaranteeing that your cache can never stop your application. On a per cache basis an application can avoid holdups caused by problems such as a slow disk in an unclustered cache or a network outage in a clustered one.
- New Coherent methods - We've added useful methods like putIfAbsent and replace to simply and easily work with a clustered or unclustered cache in a fully coherent manner. Together with the explicit locking wrapper much is possible.
- We also added a bunch of tests and bug fixes to the web-cache, an extremely useful tool for making performant web applications.
Monday, April 19, 2010
Application Server Instead Of Web Server
- Less infrastructure
- Java doesn't have buffer overflows and is a bit more secure
- You can do much more interesting caching by having the web serving and app serving in the same layer