Also, Russell Jurney wrote a useful blog covering understanding and getting started with Druid via a Twitter feed.
ScaleAholic
Scaling software should be an activity done with ease by Hardware, Cloud and System Administrators without knowledge of the applications being scaled.
Friday, August 16, 2013
Nice Updates To The Druid Site
The gang that works on Druid have done some nice work on a new website http://druid.io. The focus has been around clarifying what Druid is and isn't and simplifying adoption. Lots more work to be done but we are pleased with how it's going.
Friday, April 26, 2013
Meet the Druid and Find Out Why We Set Him Free
Whipped up a blog on Druid. Check it out.
Meet the Druid and Find Out Why We Set Him Free
Tuesday, September 25, 2012
What Would You Do Differently With Reliable In-Memory Big Data?
In The Beginning
Two years ago, Terracotta introduced a transformative new technology called BigMemory, delivering revolutionary advances in both scale and predictability for in-memory data management. Since then, our customers have leveraged BigMemory to eliminate tuning and achieve 1000x per-node jumps in scale—from about 2 GB per node to 2TB per node—all while achieving lower, more predictable latencies.
Moving Forward
Since that time we've layered in additional powerful technologies like on-heap bytes based tuning (ARC) and Search.
Once you start acting on that much data in-memory it becomes important that your able to keep it there. After all, a crash that requires you to rebuild gigs and gigs of data from other sources when the node restarts makes the application impractical. So we recently added a fast, fault tolerant, restartable store (FRS).
With that kind of data one needs to see what's going on or you'll feel blind and helpless. So we followed up with a secure monitoring and management console to go along with it called Terracotta Management Console (TMC).
The Problem
These pieces when put together create an extremely powerful in-memory data management solution. There was just one more problem. If people weren't used to storing large amounts of data in-memory/in-process, then how would they ever start to explore what's possible?
Resolution
We decided that the best way to change behavior was to make it free. So you can now download BigMemory Go—and use it in production up to 32 gigs—on as many machines as you desire. And you won't pay us a dime.
So....
What Would You Do Differently With Reliable In-Memory BigData?
I look forward to finding out. Download BigMemory Go free at http://terracotta.org.
Tuesday, March 6, 2012
How Do You Say "Ehcache?"
What's the most pressing issues at Terracotta? A challenge that has been dogging us for years? Why of course it's teaching people the proper pronunciation of "Ehcache." Have no fear. Here is a video that will help you to get it right!
Wednesday, November 16, 2011
What's Apache licensed, improves performance, reduces tuning and is now GA?
I'm glad you asked... Ehcache 2.5 with ARC went GA today. At a couple of megs Ehcache with ARC is lightweight and helps with real tuning and performance issues in Java applications.
Like Ehcache itself I'll keep this blog lightweight.
Here is why you care:
Like Ehcache itself I'll keep this blog lightweight.
Here is why you care:
- How Much/No OOME's - Specify how much data you want to cache in bytes for on heap unserialized objects. Ehcache keeps the cache under that size to avoid OOME's
- Runtime, Tune as needed - Specify total heap to use for cache and then, when you feel like it, tweak individual caches for optimal performance.
- CacheManager - Max Bytes on heap (Percentage of heap or fixed number)
- Cache - max bytes on heap (Percentage of CacheManager or fixed number)
- Lots of caches made easy - Use hundreds of caches without sizing each one individually
- Hibernate people, this means YOU.
- Did I mention NO OOME's
- Fence your heap - Like Virtualized Machines give your app's internals a percentage of the total heap and your users a percentage. Prevent the two from stomping each other.
- People who ship software to others TAKE NOTE.
- Pinning - Pin caches or entries to heap that need to always be available and extremely fast.
Who Should Care?
- Hibernate users
- Lots of caches, impossible to tune each individually
- Spring Annotations users
- Pick the amount of heap you want to dedicate to caching and then just mark the methods that need caching
- Software Shippers (Frameworks, Servers or Apps)
- Don't force users to size caches
- Reserve part of the heap for your internal state and assign a safe size for users
- Application Admins
- Monitor an application in production and do sizing/pinning where needed on the fly
Conclusions
Ehache and ARC are free, lightweight and likely already in your application (embedded in Spring, Hibernate, Cold Fusion and countless other servers and frameworks.) This is the most requested feature set in the history of Ehcache.
More Info:
Tuesday, September 27, 2011
Spring Performance With Annotations and Ehcache ARC
The goal of this blog is to highlight Ehcache's new ARC feature using Spring Annotations.
Terracotta is releasing Ehcache 2.5. It's biggest new feature is ARC (Automatic Resource Control). ARC brings runtime performance tuning and caching to the Systems Administrator while simplifying it for the Developer. I've written a couple of blogs describing what it is and the value it brings:
When you started your Tomcat instance some amount of memory was reserved in the command line (Xms Xmx). If you didn't set a maxElementsInMemory your cache would grow infinitely until long GC's and then an OOME occur. So maxElementsInMemory is a resource management choice. But why 100? Is it because you know how big your entries are and you know 100 is exactly some percentage of the heap? More likely it's a guess. Probably a low ball guess so that you don't OOME wasting a whole bunch of heap. Now say you have 10's or 100's of caches that hold objects of varying size. How much harder is it to set the maxElementsInMemory now?
This is a data freshness control. If your using an unclustered cache and multiple nodes then objects are likely being updated in a DB and the data in your cache can get out of date. How out of date? Depends what you set your timeToLiveInSeconds to.
NOTE: Keeping all nodes up to date is one of the reasons people use clustered caches.
Terracotta is releasing Ehcache 2.5. It's biggest new feature is ARC (Automatic Resource Control). ARC brings runtime performance tuning and caching to the Systems Administrator while simplifying it for the Developer. I've written a couple of blogs describing what it is and the value it brings:
These two blogs and the Ehcache docs are excellent sources of information on these highly requested/useful new capabilities.
Getting Started
I wanted to give a brief demonstration of how someone would leverage Ehcache ARC in practice so I pinged the guys who do Spring Annotations and they OK'd me to leverage one of their samples. Spring Annotations enable one to mark methods that do performance sensitive operations to be cached to improve the application's speed.
In order to follow along with me you'll want to read up on and check out the Spring Annotations sample here:
Download it and play with it a bit to get familiar with how it works.
This sample is fairly self contained. It show's how to use the various annotations to improve performance in a simple and concise way. What it doesn't talk about is the configuration of the underlying cache in use. This ends up being very challenging in practice. Without help you'll run into at least a few of these 5 problems:
- Crash - Out of Memory Errors can crash your Application
- Pause - Long GC's will pause your application leading to an unpredictable user experience
- Wasted Space - In order to avoid the above two problems you over provision memory wasting ton's of space
- Tuning Hell - Either you or your users spend ton's of time making arbitrary decisions trying to balance out the memory usage of your caches to avoid the above 3 issues
- Poor Performance - Caches don't really help because they are incorrectly sized
Let's look at the ehcache.xml that ships with this sample:
The important parts of this config are from lines 30-31. It sets maxElementsInMemory to 100 and timeToLiveInSeconds to 300.
Why do these need to be set? Let's look at them individually:
maxElementsInMemory=100
When you started your Tomcat instance some amount of memory was reserved in the command line (Xms Xmx). If you didn't set a maxElementsInMemory your cache would grow infinitely until long GC's and then an OOME occur. So maxElementsInMemory is a resource management choice. But why 100? Is it because you know how big your entries are and you know 100 is exactly some percentage of the heap? More likely it's a guess. Probably a low ball guess so that you don't OOME wasting a whole bunch of heap. Now say you have 10's or 100's of caches that hold objects of varying size. How much harder is it to set the maxElementsInMemory now?
timeToLiveInSeconds = 300
This is a data freshness control. If your using an unclustered cache and multiple nodes then objects are likely being updated in a DB and the data in your cache can get out of date. How out of date? Depends what you set your timeToLiveInSeconds to.
NOTE: Keeping all nodes up to date is one of the reasons people use clustered caches.
Configuring a cache using the above controls makes an application sensitive to use case changes, changes in JVM heap settings, and number of caches. It also leaves the user of the application with a choice of leaving a lot of headroom to handle the worst case, risking OOME's and forcing the admin or user to tune at every deployment.
Ehcache ARC Gets Performance Without All The Tuning
These are the problems that Ehcache ARC, the most requested feature in Ehcache history, are designed to eliminate. Ehcache ARC makes your cache or caches controllable using the same metric the JVM's are started with. The same resource you are trying to manage. Bytes! In addition, instead of having to tune each individual cache you can instead make a high level choice at the CacheManager level and let ARC balance the resources for you.
In this change to the sample we have removed the cache entry count and instead allowed the cache to use 50% of the JVM heap (look at line 4 where it says maxBytesOnHeap). This setting tells the cache manager to allow for up to 50 percent of the heap for caching (Can also be configured using a fixed number of bytes instead of percentage). It will balance this heap across what ever caches are defined with this manager. This allow's an application to manage the available resources without waste and without GC pauses/OOMEs.
I updated and rebuilt my spring-annotations to use 2.5.0 and then rebuilt the sample using my newly created version of Ehcache Spring Annotations. You can grab the updated Jar here though I suspect the Spring Annotations guys will update the official build soon.
By using this approach as you add more and more @Cachable annotations to your application you don't have to change anything about the caches you create in ehcache.xml. They make use of the available cache resources as provided.
Wrapping Up
It has always been pretty easy to integrate a cache into an application. Things like Spring Annotations move the bar on that simplicity even further. The challenge was to make those caches effective. How does one get the most performance out of the resources available. Ehcache ARC was created to solve that problem for System Administrators, OEM's Developers and the entire OSS community. We are looking forward to people getting started with it and hope to receive lots of feedback.
Wrapping Up
It has always been pretty easy to integrate a cache into an application. Things like Spring Annotations move the bar on that simplicity even further. The challenge was to make those caches effective. How does one get the most performance out of the resources available. Ehcache ARC was created to solve that problem for System Administrators, OEM's Developers and the entire OSS community. We are looking forward to people getting started with it and hope to receive lots of feedback.
Labels:
cache,
distributed cache,
ehcache,
GC,
heap,
Java,
memory,
Open Source
Location:
San Francisco, CA, USA
Monday, August 1, 2011
What Is Terracotta?
One of the biggest challenges in software is telling people what your software is in a way that helps them make decisions about it. This challenge can often be as difficult as designing and building the software itself. The Terracotta team recently hooked up with the gang from Epipheo studios to create a 2 minute video to do just that. We spent a lot of time thinking about and refining things in order to get a clear succinct message. I think it came out quite well.
Check it out:
Subscribe to:
Posts (Atom)