NOTE: There has been new insight into the content of this article which invalidates much of what I originally wrote here. I'm leaving the article untouched for posterity, but I want everyone to know that these results are not definitive. If you're curious, test the code yourself. ;) I will post a new article soon that explains the new information, but don't have the time to do that today.
Caching is something that just about anyone who works with ColdFusion does to some extent... and that's probably true of most programming languages. A little while ago I realized that my own caching routines were a bit primitive and I decided to upgrade them and about that time I also realized that, gosh, the ColdFusion community as a whole is duplicating a lot of effort around this notion of caching. There are several different cache-management projects on RIAForge each with a different bent of course. There's a Java softcache project and not one, but TWO separate memcached projects (edit: one of them has since been removed). Some of you already know that this inspired me to start working on a caching framework. And yeah, I know I just said we're duplicating a lot of effort, so it sounds like I'm just adding to the problem, but I don't think so. My plan is to create a caching framework that will handle all the use cases handled by the various techniques currently in use and add to that auto-configuring algorithms, so that you don't have to worry about choosing between different caching frameworks or techniques. If it works as planned, hopefully various other frameworks like ColdBox will begin to use CacheBox instead of maintaining their own caching code and the end result will be both a standard and a HUGE time-saver for the ColdFusion community at large.
Okay, that's the background information. Now for the task at hand. In order for me to make this work, I need to find a basic model for caching to use as a foundation on which the various use-cases can be layered.
Most folks who are writing caching routines in ColdFusion environments today are using structures to hold the cache. And most of the time this works. Sometimes it doesn't.
Back in January, Hal Helms reposted an article he wrote about testing our work. In it he mentions a debacle that happened with Toys-R-Us a while back. The developers hired to work on the project had decided to cache all the products in their database, which Hal describes as an "outside the box" solution. Is it?
I don't think the Toys-R-Us programmers were actually doing anything particularly unusual. I've worked on a lot of projects that used that technique, and honestly most of the time it works. It does however fail occasionally under load because of the way structures work. I described the problem specifically not long ago on the DataFaucet blog.
In short, structures are very efficient for small numbers of elements. As the number of keys increase you get collisions and the work that's done to manage those collisions deteriorates performance until the structure actually becomes less efficient than other solutions.
When I say "small", I'm not talking about 10 or 20, I'm talking about thousands.
But what if you actually do need to cache many thousands of something? It could happen. And of course RAM keeps getting cheaper and the underlying Java limitations will continue to expand, making more of that memory available to us. So we may as well explore what's upcoming for us in terms of caching and scaling in the next few years.
In his article, Hal actually hinted at what turns out to be a pretty decent solution. He describes it as an "in memory RDBMS". I think he was imagining something a little different than what I'm creating, but here's the guts of it.
I did some very rudimentary testing today and have found some important differences in the performance of structures versus the performance of queries. The use of queries and query-of-query in ColdFusion seems to be a much more robust model for scaling up with regard to caching.
This is not load-testing. I would love for someone to run a load-testing tool against this code, to give us some better benchmarks, but I don't have the equipment. As a matter of fact, my old notebook is in bad need of being replaced and for that reason I have the memory settings on my JVM tuned way, way down... to like half what the defaults are, maybe less. Because I was almost constantly running over 50% page file usage.
That being said, I wrote a few templates that create either a structure or a query and then populate it with an arbitrarily large number of entries. Each entry contains several items including an id, a hit count, the first and last time the entry was touched, and an object.
I didn't actually create separate objects, I just created one object and injected a pointer to the same object for each record in the query (or structure) to keep my memory consumption down and let me work with larger numbers of entries in the query (or struct). I then output the time it took to generate the query or struct, loop over a fetch operation for each until it reaches 1000ms to see how many fetches I can perform per second.
Finally I perform a purge operation to remove expired cache, which is where I think this gets really interesting. Purging the structure requires us to loop over each individual structure key, but yanking the expired entry is very easy. Using the query allows us to yank many entries at once (without looping over the query), which is much faster. In order to track the hit counts we need to maintain an extra "index" column which just contains the value of "currentrow" and when we yank our expired cache, we have to update the index which then adds to the cost of the purge. So the question is really interesting - is it faster to loop over the structure and purge individual entries or to purge them all and then update the index?
BenefitsUsing a query seems to consume less memory and is generally more stable and scales a lot better than using a structure.
At low numbers in cache (10k) the query is usually a little faster to create than the structure although not very much. It's close to one second to create the query (adding each row individually) and close to two seconds average to create the structure, but remember we're talking about 10k records spread out over who knows how many individual requests, so that performance isn't worth mentioning.
Purging the cache is definitely faster with the query in spite of the need to reindex. With 10k records the query purged in about 30ms compared to about 600ms to purge the structure. This was removing an approximate 20% of the total cache and again, given that purging is only likely to happen every few minutes this isn't a huge issue and probably not really worth changing any of our code for even in spite of its being a 200% improvement in performance. I will say that as the size of the cache increased, the query scaled up better also. With 20k records the structure purged in about 2.5 seconds (3x the cost for 2x the volume) whereas the cost of the query purge increased to about 200ms indicating near linear growth (5x the cost for 5x the volume). Remember that with the query, we're able to simply fetch all the records that are still valid via a query-of-query, whereas with the structure we have to loop over the individual structure keys, fetch each key and perform logic on its sub-keys. Apparently that's more costly than rebuilding the query index.
(I actually tried a couple different methods to rebuild the query index and it turns out that the "brute force" method of just looping over the query and resetting the value is the most efficient! And that doesn't change with the size of the query either. That surprised me.)
What surprised me most however was the memory consumption of the structures as they scaled up. They continued to get slower and slower as I expected, I assume for the reasons I described before. As the collisions become more of an issue the problem grows geometrically.
At 40k entries I was just barely able to create the structure in about 23 seconds. That's 10x the cost for 4x the number of entries. Obviously the underlying objects are having some kind of problem functioning under the load and having to do more work. Unfortunately at that point I was unable to perform enough fetch operations to determine how many would occur in a given second because it produced a Java heap "out of memory error" in the middle of the loop. At 50k entries I was unable to build the structure with my JVM settings tweaked down the way they are.
Comparatively, at 50k records the query was still chugging along just fine, handling everything I threw at it. I was able to create the query in about 12 seconds (half the time it took to create 40k entries in the struct).
The query model also provides us with a convenient ability to perform some selective purging very rapidly. Lets say that we've cached a number of different things that are related -- several different views onto a particular object. Now lets say that we modified the object and so we want to manually purge all the cache for that object. If we were using a structure we would have a couple of choices.
One we could loop over the structure incurring the full purge cost each time we make a change or two, we could make the caching mechanism more complex by adding sub-structures and purge the parent. This latter technique (what the onTap framework does currently) causes problems however when we want to see what's in the cache. We can't very easily maintain information about how many items we're caching their hits, or report on that information because of the complexity of the sub-structure. Also sub-structures may degrade performance as well with each nesting level incurring its own penalty.
The query on the other hand solves both problems at once. It allows us to maintain and report on all the statistical information we want about use times and hit counts without adding any real complexity and in addition it actually simplifies a number of race conditions that have to be accounted for with the more complex approach I've been using. If we want to report on the statistical information, generating the report will be much more efficient than any structure-based approach could ever hope to be. And if we want to selectively purge several related items from the cache, we can do that by specifying a pattern to purge, such as "object-type/#id#/%" and then any records in the cache that begin with the specified object type and id can be quickly and easily purged. My caching systems in the onTap framework do that now, but the code for doing it is much more complex than this query approach makes it a lot harder to do any kind of reporting, so I'm really excited about this change personally.
DrawbacksThe only drawback to using the query instead of a structure is seen when performing fetch operations. But given the mountain of benefits I think the performance cost to implement it is worth it. The performance of fetch operations with either solution degrades gradually as the size of the cache increases. The question is how much and is it enough to be problematic.
It takes quite a lot of content actually to make a fetch operation with a structure take any time at all. With 10k records, the structure in my case was performing over 100k fetch operations per second. With the same volume the query was performing about 200 fetches per second. Now if you were to compare that in terms of percentages, it's true that the structure is fetching over 500% faster than the query when there are 10k records.
Realistically however I don't think this is an issue.
An individual page request isn't going to be attempting to fetch thousands of pieces of cache, it's going to be attempting to fetch maybe a dozen or so at most. So if you string together a dozen or so fetch operations, you're talking about a maximum of about 50ms or so for these fetch operations for the entire request. A human eye blinks in about 1/10th of a second. The span of 50ms performing ALL the fetch operations for a given page even under a load of as many as 10k records will be 1/20th of the time it takes to blink. So even with the relatively slow performance of the query fetch, you could perform all the fetch operations for as many as twenty pages in the span of one blink of your eye.
Yes it's true that the structure could perform the fetch operations for a thousand pages in the same time... however... why would that matter? That's way more than you'll ever possibly need. Unless you've got a use case in which you actually need to perform thousands of fetch operations on a single page (which isn't very likely), I don't think anyone will notice. ;) Worrying about the performance of the query-fetch would basically be like worrying that UUIDs aren't unique enough. Sure, you could create an algorithm that produces an even more unique value, but nobody does because nobody has a need for it.
The only way I could see running into an issue with either of them would be to create a UUID for each chromosome in the human genome and then cache an object to represent each one. But at that point you would run out of memory anyway. ;)
As the volume of the cache increases from 10k the query option continues to slow down. At 50k records in my case it was performing only about 70 fetch operations per second (14ms per fetch). Remember however that this was the point at which I was totally unable to even build a structure with that number of entries because the server fell over with an out of memory error after 3 minutes.
ConclusionSo personally I think it's pretty conclusive that in spite of the cost of fetching, the query is a much more robust and useful model for caching than the use of structures that's common in the ColdFusion community. I would however love to hear your thoughts. It's certainly possible there are things I've overlooked. And I'd love for someone to put my code here under analysis with a legitimate load-testing tool. There's a download in the links at the bottom of this article, between "print" and "linking blogs".