A New Model for Caching in ColdFusion (?)

NOTE: There has been new insight into the content of this article which invalidates much of what I originally wrote here. I'm leaving the article untouched for posterity, but I want everyone to know that these results are not definitive. If you're curious, test the code yourself. ;) I will post a new article soon that explains the new information, but don't have the time to do that today.

Caching is something that just about anyone who works with ColdFusion does to some extent... and that's probably true of most programming languages. A little while ago I realized that my own caching routines were a bit primitive and I decided to upgrade them and about that time I also realized that, gosh, the ColdFusion community as a whole is duplicating a lot of effort around this notion of caching. There are several different cache-management projects on RIAForge each with a different bent of course. There's a Java softcache project and not one, but TWO separate memcached projects (edit: one of them has since been removed). Some of you already know that this inspired me to start working on a caching framework. And yeah, I know I just said we're duplicating a lot of effort, so it sounds like I'm just adding to the problem, but I don't think so. My plan is to create a caching framework that will handle all the use cases handled by the various techniques currently in use and add to that auto-configuring algorithms, so that you don't have to worry about choosing between different caching frameworks or techniques. If it works as planned, hopefully various other frameworks like ColdBox will begin to use CacheBox instead of maintaining their own caching code and the end result will be both a standard and a HUGE time-saver for the ColdFusion community at large.

Okay, that's the background information. Now for the task at hand. In order for me to make this work, I need to find a basic model for caching to use as a foundation on which the various use-cases can be layered.

Most folks who are writing caching routines in ColdFusion environments today are using structures to hold the cache. And most of the time this works. Sometimes it doesn't.

Back in January, Hal Helms reposted an article he wrote about testing our work. In it he mentions a debacle that happened with Toys-R-Us a while back. The developers hired to work on the project had decided to cache all the products in their database, which Hal describes as an "outside the box" solution. Is it?

I don't think the Toys-R-Us programmers were actually doing anything particularly unusual. I've worked on a lot of projects that used that technique, and honestly most of the time it works. It does however fail occasionally under load because of the way structures work. I described the problem specifically not long ago on the DataFaucet blog.

In short, structures are very efficient for small numbers of elements. As the number of keys increase you get collisions and the work that's done to manage those collisions deteriorates performance until the structure actually becomes less efficient than other solutions.

When I say "small", I'm not talking about 10 or 20, I'm talking about thousands.

But what if you actually do need to cache many thousands of something? It could happen. And of course RAM keeps getting cheaper and the underlying Java limitations will continue to expand, making more of that memory available to us. So we may as well explore what's upcoming for us in terms of caching and scaling in the next few years.

In his article, Hal actually hinted at what turns out to be a pretty decent solution. He describes it as an "in memory RDBMS". I think he was imagining something a little different than what I'm creating, but here's the guts of it.

I did some very rudimentary testing today and have found some important differences in the performance of structures versus the performance of queries. The use of queries and query-of-query in ColdFusion seems to be a much more robust model for scaling up with regard to caching.

This is not load-testing. I would love for someone to run a load-testing tool against this code, to give us some better benchmarks, but I don't have the equipment. As a matter of fact, my old notebook is in bad need of being replaced and for that reason I have the memory settings on my JVM tuned way, way down... to like half what the defaults are, maybe less. Because I was almost constantly running over 50% page file usage.

That being said, I wrote a few templates that create either a structure or a query and then populate it with an arbitrarily large number of entries. Each entry contains several items including an id, a hit count, the first and last time the entry was touched, and an object.

I didn't actually create separate objects, I just created one object and injected a pointer to the same object for each record in the query (or structure) to keep my memory consumption down and let me work with larger numbers of entries in the query (or struct). I then output the time it took to generate the query or struct, loop over a fetch operation for each until it reaches 1000ms to see how many fetches I can perform per second.

Finally I perform a purge operation to remove expired cache, which is where I think this gets really interesting. Purging the structure requires us to loop over each individual structure key, but yanking the expired entry is very easy. Using the query allows us to yank many entries at once (without looping over the query), which is much faster. In order to track the hit counts we need to maintain an extra "index" column which just contains the value of "currentrow" and when we yank our expired cache, we have to update the index which then adds to the cost of the purge. So the question is really interesting - is it faster to loop over the structure and purge individual entries or to purge them all and then update the index?

Benefits

Using a query seems to consume less memory and is generally more stable and scales a lot better than using a structure.

At low numbers in cache (10k) the query is usually a little faster to create than the structure although not very much. It's close to one second to create the query (adding each row individually) and close to two seconds average to create the structure, but remember we're talking about 10k records spread out over who knows how many individual requests, so that performance isn't worth mentioning.

Purging the cache is definitely faster with the query in spite of the need to reindex. With 10k records the query purged in about 30ms compared to about 600ms to purge the structure. This was removing an approximate 20% of the total cache and again, given that purging is only likely to happen every few minutes this isn't a huge issue and probably not really worth changing any of our code for even in spite of its being a 200% improvement in performance. I will say that as the size of the cache increased, the query scaled up better also. With 20k records the structure purged in about 2.5 seconds (3x the cost for 2x the volume) whereas the cost of the query purge increased to about 200ms indicating near linear growth (5x the cost for 5x the volume). Remember that with the query, we're able to simply fetch all the records that are still valid via a query-of-query, whereas with the structure we have to loop over the individual structure keys, fetch each key and perform logic on its sub-keys. Apparently that's more costly than rebuilding the query index.

(I actually tried a couple different methods to rebuild the query index and it turns out that the "brute force" method of just looping over the query and resetting the value is the most efficient! And that doesn't change with the size of the query either. That surprised me.)

What surprised me most however was the memory consumption of the structures as they scaled up. They continued to get slower and slower as I expected, I assume for the reasons I described before. As the collisions become more of an issue the problem grows geometrically.

At 40k entries I was just barely able to create the structure in about 23 seconds. That's 10x the cost for 4x the number of entries. Obviously the underlying objects are having some kind of problem functioning under the load and having to do more work. Unfortunately at that point I was unable to perform enough fetch operations to determine how many would occur in a given second because it produced a Java heap "out of memory error" in the middle of the loop. At 50k entries I was unable to build the structure with my JVM settings tweaked down the way they are.

Comparatively, at 50k records the query was still chugging along just fine, handling everything I threw at it. I was able to create the query in about 12 seconds (half the time it took to create 40k entries in the struct).

The query model also provides us with a convenient ability to perform some selective purging very rapidly. Lets say that we've cached a number of different things that are related -- several different views onto a particular object. Now lets say that we modified the object and so we want to manually purge all the cache for that object. If we were using a structure we would have a couple of choices.

One we could loop over the structure incurring the full purge cost each time we make a change or two, we could make the caching mechanism more complex by adding sub-structures and purge the parent. This latter technique (what the onTap framework does currently) causes problems however when we want to see what's in the cache. We can't very easily maintain information about how many items we're caching their hits, or report on that information because of the complexity of the sub-structure. Also sub-structures may degrade performance as well with each nesting level incurring its own penalty.

The query on the other hand solves both problems at once. It allows us to maintain and report on all the statistical information we want about use times and hit counts without adding any real complexity and in addition it actually simplifies a number of race conditions that have to be accounted for with the more complex approach I've been using. If we want to report on the statistical information, generating the report will be much more efficient than any structure-based approach could ever hope to be. And if we want to selectively purge several related items from the cache, we can do that by specifying a pattern to purge, such as "object-type/#id#/%" and then any records in the cache that begin with the specified object type and id can be quickly and easily purged. My caching systems in the onTap framework do that now, but the code for doing it is much more complex than this query approach makes it a lot harder to do any kind of reporting, so I'm really excited about this change personally.

Drawbacks

The only drawback to using the query instead of a structure is seen when performing fetch operations. But given the mountain of benefits I think the performance cost to implement it is worth it. The performance of fetch operations with either solution degrades gradually as the size of the cache increases. The question is how much and is it enough to be problematic.

It takes quite a lot of content actually to make a fetch operation with a structure take any time at all. With 10k records, the structure in my case was performing over 100k fetch operations per second. With the same volume the query was performing about 200 fetches per second. Now if you were to compare that in terms of percentages, it's true that the structure is fetching over 500% faster than the query when there are 10k records.

Realistically however I don't think this is an issue.

An individual page request isn't going to be attempting to fetch thousands of pieces of cache, it's going to be attempting to fetch maybe a dozen or so at most. So if you string together a dozen or so fetch operations, you're talking about a maximum of about 50ms or so for these fetch operations for the entire request. A human eye blinks in about 1/10th of a second. The span of 50ms performing ALL the fetch operations for a given page even under a load of as many as 10k records will be 1/20th of the time it takes to blink. So even with the relatively slow performance of the query fetch, you could perform all the fetch operations for as many as twenty pages in the span of one blink of your eye.

Yes it's true that the structure could perform the fetch operations for a thousand pages in the same time... however... why would that matter? That's way more than you'll ever possibly need. Unless you've got a use case in which you actually need to perform thousands of fetch operations on a single page (which isn't very likely), I don't think anyone will notice. ;) Worrying about the performance of the query-fetch would basically be like worrying that UUIDs aren't unique enough. Sure, you could create an algorithm that produces an even more unique value, but nobody does because nobody has a need for it.

The only way I could see running into an issue with either of them would be to create a UUID for each chromosome in the human genome and then cache an object to represent each one. But at that point you would run out of memory anyway. ;)

As the volume of the cache increases from 10k the query option continues to slow down. At 50k records in my case it was performing only about 70 fetch operations per second (14ms per fetch). Remember however that this was the point at which I was totally unable to even build a structure with that number of entries because the server fell over with an out of memory error after 3 minutes.

Conclusion

So personally I think it's pretty conclusive that in spite of the cost of fetching, the query is a much more robust and useful model for caching than the use of structures that's common in the ColdFusion community. I would however love to hear your thoughts. It's certainly possible there are things I've overlooked. And I'd love for someone to put my code here under analysis with a legitimate load-testing tool. There's a download in the links at the bottom of this article, between "print" and "linking blogs".

Thanks! :)

Related Blog Entries

Comments
Steve Bryant's Gravatar Ike,

This looks really promising. I would love to see some implementation code for how to use CacheBox.
# Posted By Steve Bryant | 11/30/08 9:59 PM
ike's Gravatar Thanks Steve. :) Well this bit is of course just a foundational element. The query by itself doesn't cover all the potential use cases. As of yet I haven't added enough code to the CacheBox framework core to get it to a functional state. I started putting some general ideas into code and put them in the SVN repository, but at this point I'm thinking that some of what I've put in there is going to be replaced. Partly it will be replaced by this query method, but also I think I may need to revise the agent a little bit.

Part of the idea behind the framework is that other applications will be able to use an "agent" CFC without first installing CacheBox, so that the whole caching process becomes fairly seamlessly separated from the rest of the app -- basically, your app doesn't need to really know when or where or how things are cached, although you will have the ability to supply "hints" and the framework will work backward to fulfill the hint or will degrade to a next-best scenario. But getting back to not having working code and needing to revise the agent -- in order for this to work properly I think the agent needs to be really minimal, to limit the potential that it might need to change or that changes to the agent might not be backward compatible (like recent SVN client changes). And as I've thought about the project over the past month or so I think the mental picture for simplifying the agent has solidified better for me and now is a bit simpler / better than it was when I started. :)

Anyway, Rob Brooks-Bilson mentioned posting his notes from his Max presentation tomorrow and I want to read those before I get further into the development on CacheBox. :)
# Posted By ike | 11/30/08 10:24 PM
Ben Nadel's Gravatar @Ike,

This is some cool stuff. I would definitely think that Queries would be slower to create and use than structures. I am very surprised that structs seem to fail under large volume. I suppose it has to do with the buckets that the keys get put into (trying to remember back to my computer science classes)? Very cool stuff.
# Posted By Ben Nadel | 12/1/08 9:40 AM
ike's Gravatar @Ben - yeah, right, I think we sort of habitually think of structs as being super-fast in all circumstances, just because under th e right circumstances (small numbers) they are so efficient, often way more so than they need to be. But the large volume issue turns up because they convert your string into a single integer and to do that quickly the hash function is a little sloppy, not guaranteeing uniqueness. Once you start getting several strings stored in the same numeric index ("bob", "joe" and "mary" are all in position 5 of the array for example), then any one of the several methods they use to resolve that collision (plugging the leak in the abstraction) can become much less efficient. What surprised me more than that was the memory consumption - I just didn't expect them to consume much memory at all, but apparently they do consume more than growing a query.
# Posted By ike | 12/1/08 11:56 AM
Brock Baxter's Gravatar This blows my mind. I'm glad there are smart people like you on the front lines here exposing all this stuff. Ever consider writing a book?
# Posted By Brock Baxter | 12/3/08 8:49 PM
ike's Gravatar Thanks Brock. :) I have thought about it actually, though I've never gotten around to sitting down and writing any longer volumes on programming... The documentation for all my projects is getting close though. ;)

Although I have been working on a non-programming book over the past year or so called the Optimist's Wager. That particular book is mostly about cognitive science and general probability although I didn't work on any of the science, I'm just compiling research from other sources. More on that here if you're interested ://smolderingremains.deviantart.com/art/Optimist-s-Wager-Ch-1-82212934
# Posted By ike | 12/3/08 10:57 PM
Ben Nadel's Gravatar @Ike,

After reading this, I wanted to do some experimentation with caching actual CFML code and then accessing it via CFInclude tags (ultimately wrapped in a Custom Tag or something probably). It is slower than in-memory access, as to be expected, but I wonder how it would hold up on a high-traffic, database-driven site:

://www.bennadel.com/blog/1422-experimenting-with-flat-file-coldfusion-cfml-caching.htm
# Posted By Ben Nadel | 12/5/08 9:49 AM
ike's Gravatar Hey Ben, thanks for the heads-up! :)
# Posted By ike | 12/5/08 11:53 PM
Hatem Jaber's Gravatar @ike, I want to recommend a software that a couple of my clients use quite heavily, it's called Dragon NaturallySpeaking 10. Trust me when I tell you that you will create tons of content for your books, blog, whatever.
# Posted By Hatem Jaber | 12/6/08 2:28 PM
ike's Gravatar Thanks Hatem. :) I actually had a copy of Dragon NaturallySpeaking several years ago and although it seemed like a decent product I didn't keep up with it enough at the time to get it to the point where it was very good at recognizing my voice. I was still struggling with getting it to accept a sentence or two without having to say "back xxx back xxx back xxx back xxx" on several words until it got the right word. Of course people have no problem understanding me when I talk. ;) But I assume the technology has improved a great deal since the last time I tried it which I think was around version 6. I'll have to look into it again. :)
# Posted By ike | 12/6/08 6:30 PM
Hatem Jaber's Gravatar @ike, I should have mentioned that both clients had a hard time with the previous versions, the version 10 that they are on now is far superior according to these guys.
# Posted By Hatem Jaber | 12/7/08 9:24 AM
ike's Gravatar Thanks to everyone for the comments on this entry. I wanted to post another comment here so that those of you who subscribed will know that I discovered a problem with the information in the article. I don't have time to post a follow-up today but I plan to post one soon that will explain the whole thing. :)
# Posted By ike | 12/12/08 2:27 AM
Kile's Gravatar I'm curious,

What is the insight that invalidates what you wrote above? I looked through your other posts and didn’t see anything, can you at least explain what was wrong with the approach?
# Posted By Kile | 1/22/09 7:15 PM
ike's Gravatar @Kile - all the performance testing in this article is invalid - the query is actually much slower than using the structure and the structure is viable for a much larger number of entries than indicated in this article - the reason why the test turned out this way is because I had memory tracking enabled in the ColdFusion Monitor. The memory tracking issue is mentioned in the "weird performance issue resolved" article, but it doesn't mention structures specifically.
# Posted By ike | 1/22/09 8:05 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.5.006. | Protected by Akismet | Blog with WordPress