This is the first in a series of posts where I’ll discuss caching techniques, and how we apply them in ColdFusion.
So, what is a cache?
A cache in software terms means exactly the same as it does in any English dictionary, at least in the sense that it’s a store of items. More specifically, it’s often intended to mean a store of items that are kept in memory for fast access. Which brings us to our next question…
Why cache at all?
Consider that database table you have in your application, the one that is almost never changed, the one that you spend 99% of your time reading from, and only 1% of your time writing to. Do you really need to pay the cost of running the query against your database server each time? How about we just keep it in memory and so provide almost instantaneous access to the data.
Do note, however, that a cache typically does not contain all the data from an underlying store - often, only a subset of data may be maintained based on the caching algorithm in use. For example, if the database table I spoke of earlier had terabytes of data, the cache would probably contain on a few megabytes of the table, based on requirements. If you’d like to take a bit of a detour here, check out in-memory databases - they’ve been around for a while, and much interesting work continues to be done in this area.
But back to my post…
Now that we know the whats and whys of caching, let’s spend a little time on the hows. Here’s an overview of some general caching techniques, with a brief dive into some specific techniques in use within ColdFusion. It’s just to give you (in case you need it!) a little taster and enough background to understand the later posts that will deal with the application of these techniques in ColdFusion.
Cost-based Caching
A cost-based cache often has a fixed size, and maintains items which have the highest cost to the application using the cache. The cost of an item may be computed base on a number of factors - some popular approaches are:
- The Least Recently Used (LRU) cache, which maintains, in opposition to it’s name, the most recently used items in the cache; the least recently used items are dropped out of the cache when new items are put into it.
- The Least Frequently Used (LFU) cache, which maintains the most frequently used items in the cache; the least frequently used items are dropped out of the cache when new items are put into it.
- A penalty-based cache, which maintains the items which take the most time (the highest penalty) to retrieve; items with lower penalties are dropped out of the cache when new items are put into it.
Any of the above mechanisms may be mixed and matched to create custom caching algorithms that make sense in particular contexts.
Memory Sensitive Caching
A memory sensitive cache is one which maintains its size based on the memory available to the process within which it is running. If the process has more free memory, the cache may grow to maintain more data; if the process starts doing some kind of memory-intensive computation, the cache may shrink itself to free space for the computation.
These are particularly easy to implement in Java, thanks to the java.lang.ref package. I’ll have to talk about garbage collection in Java before I explain this further.
The Java garbage collector is responsible for ensuring that objects which are no longer reachable (i.e., not in use) are cleared out so that the memory they occupy can be freed up for use. An object is considered to be dead when it is not reachable by traversing object references through a tree of live objects; a live, or strongly reachable, object is one which is referenced by a running thread, or by another live object.
Objects may also be programatically wrapped in soft reference or weak reference objects, as defined in the java.lang.ref package. A softly reachable object is one which is not strongly reachable, but is reachable by traversing a soft reference. Softly reachable objects are cleared from memory by the garbage collector as and when it determines that the process requires more free memory. How a garbage collector makes this determination is implementation-dependent. A weakly reachable object is one which is neither softly reachable not strongly reachable, but is reachable by traversing a weak reference; weakly reachable objects are immediately cleared out by the garbage collector.
As is self-evident, soft references may be used to implement a memory sensitive cache. Weak references have their place too - these can be used to maintain a cache of metadata about objects maintained elsewhere, perhaps in a live thread, or in another cache, with the assurance that the metadata will be cleared out as soon as the object ceases to be referenced.
Cache Performance
Cache performance is often measured using the cache hit ratio. We say a cache is ‘hit’ when a requested item is available in the cache. Conversely, a cache is ‘missed’ when a requested item is not available in the cache, and needs to be retrieved from the underlying storage. The cache hit ratio is computed as (hits/hits+misses), that is, the proportion of requests to the cache that resulted in hits. A higher hit ratio indicates a more performant cache.
However, there are other factors to consider when looking at cache performance as well - particularly the amount of memory consumed. A cache may have a high hit ratio simply because it’s maintaining a large proportion of data from the undelying storage in memory, which may in turn affect the performance of the process maintaining the cache, as the process may no longer have sufficient free memory to execute efficiently. Keep in mind that a well-designed cache must not only be performant in itself, but also must not adversely affect performance of the larger ecosystem that uses it.
Picking a cache
The nice thing about a memory sensitive cache is that, in general, it will never cause your process to run out of memory, as an oversized cost-based cache may do. The downside, though, is that when the process is starting to max out CPU cycles and memory utilization, a memory sensitive cache will become completely non-functional, as all items will have to be retrieved from underlying storage rather than the cache, which may prove to be the proverbial straw that breaks the process’ back. A cost-based cache, on the other hand, will always have reliable performance, provided it’s been sized properly so that the hit ratio is sufficiently high.
Pick your caching strategy based on your application’s requirements - there is no one-size-fits-all approach!

Doug Hughes | 29-Jun-06 at 12:56 pm | Permalink
Just wondering, is there anyway to use soft reference from within CFML? For example, let’s say that I have a structure that I’m caching objects in. What if I wanted the references in that object to be soft so that if Java needed to it could garbage collect them? Essentially, I’m asking if I could create an instance of a Widget cfc and then pass that into a SoftReference and then put the soft reference in the struct? Or, are there just too many references from CF objects to make that really work?
Thanks!
ashwin | 29-Jun-06 at 5:10 pm | Permalink
I don’t see any reason why you wouldn’t be able to wrap CFC instances in a SoftReference. However, doing this is not as simple as wrapping up the instances and sticking them in a struct - what can happen is that the value in the struct would get collected by the garbage collector, while the key would remain. You would need to build on top of the ReferenceQueue class in java.lang.ref to ensure that keys get cleared out as well. I’ll try and throw a CFC together in the next day or two that’ll function as a soft cache and post it back here.
Stake Five :: Memory-sensitive Caching for CF | 01-Jul-06 at 2:12 pm | Permalink
[…] After my introduction to caching post, Doug Hughes asked whether it would be possible to wrap CFCs in soft references to create a memory-sensitive cache for use in CFML. I answered that it should, in theory, be possible, but that an implementation would have to take care of a few common design issues that occur when dealing with soft references. […]
Stake Five :: Tangling with the Template Cache | 12-Jul-06 at 12:58 pm | Permalink
[…] The ColdFusion template cache is an interesting beast; one which taught me a great deal about caching. I’ll be discussing the design of the template cache in this post, along with the rationale behind the approach that was taken, and some explanation of interesting behaviours that the template cache exhibits. Before we venture further, please do read An Introduction to Caching - you’ll need that information to understand the rest of this post. […]