A Strange Issue with CFC Performance
Generally speaking I'm not the first person to complain about performance. My response to performance issues generally is to figure out how well something must perform and then make it fill that need one way or another (often enough with some kind of caching). Which leads me to this article.
A little over a week ago I posted my last article about the potential for using queries as a mechanism for caching content in ColdFusion. The CacheBox project I'm working on is actually more involved than that, being designed to allow the cache to be stored outside the query, but it uses the query as a management tool either way, which makes reporting a lot easier. In my testing the query seemed to perform very well for every operation except the fetch, which was admittedly slower, but after calculating for what I feel are real-world use cases, it seemed even the fetch was reasonably fast enough and there were unquestionable gains in both durability and the performance of all the other operations including the store operation where the query is grown to accommodate new content.
This was great news. So I proceeded to apply this new technique to a couple of CFCs that are going into the CacheBox project. I had actually done a fair amount of ground work on the project by the time I got back to testing its performance and was then met with an odd and somewhat frustrating surprise.
Initially I expected there to be some additional performance cost for wrapping the query in a CFC (or two as the case may be). But I had rather assumed that the basic cost of making method calls on CFCs would be linear. That is to say that I expected the cost of each call to the CFC method to be roughly the same as the last one, irrespective of the number of calls. That didn't turn out to be the case. I was scaling up geometrically as had happened before with the structures. So when making only 100 calls to the store function it executed in about 2.5 seconds (not as fast as I'd have liked), but at only 300 calls to the same function, the cost to execute suddenly spiked to OVER A MINUTE! What?! An increase of 3x volume had produced an increase of 30x the execution cost!
This seemed rather odd to me, as the query had scaled up in a more linear fashion before. So I figured perhaps I had done something wrong in my CFCs. Then it occurred to me that the rather simple (and powerful) ArgumentCollection object we use in ColdFusion is actually a java object under the hood, just like the structure. And in fact, it's behavior is similar to that of a structure, with some additional features. Could it be that creating all those ArgumentCollection objects is what's causing the server to grind down like this?
So I threw together this CFC to test (notice that the method contain no logic, so if there are any performance issues present, they are inherent to the CFC):
<cfset instance = structNew() />
<cfset instance.cache = QueryNew("cachename,content") />
<cffunction name="init" access="public" output="false">
<cfreturn this />
</cffunction>
<cffunction name="fetch" access="public" output="false">
<cfargument name="cachename" type="string" required="true" />
<cfset var record = 0 />
<cfreturn record />
</cffunction>
<cffunction name="store" access="public" output="false">
<cfargument name="cachename" type="string" required="true" />
<cfargument name="content" type="any" required="true" />
<cfreturn arguments />
</cffunction>
</cfcomponent>
Then I ran this loop to see how it scaled up when the only action being performed was calling a method on a CFC.
<cfloop index="x" list="100,300,1000,3000">
<cfset ticks = getTickCount() />
<cfloop index="y" from="1" to="#x#">
<cfset agent.store("id_#y#","hello world") />
</cfloop>
<cfset time = getTickCount()-ticks />
<cfoutput>
<div>#x# records = #numberformat(time)#ms (#decimalformat(time/x)#ms / record)</div>
</cfoutput><cfflush />
</cfloop>
You can see here that all I'm doing is starting with a small number of method calls and then increasing the number of method calls by roughly 300% on each iteration of the loop. Then I'm calling the method the specified number of times with a couple of arguments and timing the results. (And I made sure that I disabled debugging and robust exceptions in the CF Admin before I tested this, knowing that they can inflate performance costs.)
Here is the end result of that test:
300 records = 469ms (1.56ms / record)
1000 records = 9,328ms (9.33ms / record)
3000 records = 154,844ms (51.61ms / record)
Bam! As I suspected, the basic function calling by itself is scaling up not in a linear fashion like I wanted, but in a geometric fashion like I thought after I started testing.
I have to hand it to Glenn Crocker. Somebody figure out what that guy's drinking, 'cause I want some! He provided an instant solution. And although I could say "that's dumb, that shouldn't be the solution" (because it is, and it shouldn't), right now I'm just happy that it works. :)
The answer?
Put the agent CFC in the application scope.
That's it... just change it to a persistent scope instead of running it in the variables scope of the page.
Instantly when I did that, the performance cost of calling the store method changed to this:
300 records = 31ms (0.10ms / record)
1000 records = 110ms (0.11ms / record)
3000 records = 171ms (0.06ms / record)
See that? Linear growth. The performance is consistently an average of about 0.1ms per call to the method.
Apparently when you put a CFC in the variables scope and start calling methods on it, it places those method calls in a stack. And of course as the stack grows (even though it ought to grow and shrink in this loop), the performance of the next iteration of the method suffers.
Which reminds me of something else I saw a little while ago about people struggling with a perceived memory leak in ColdFusion (although they never definitively said they had proved it had a leak, merely that they were almost certain). And again, at the time, the issue seemed to be when CFCs were placed into the variables scope of a page instead of being placed in a persistent scope like the application scope and used from there.
I wonder if anyone else has submitted a bug-report to Adobe for this issue? It certainly feels like a bug to me... although I'm not certain the folks at Adobe would classify it that way. Perhaps they're not expecting people to be making hundreds or thousands of method calls in the variables scope of a page? On the other hand there's nothing in the docs that says you shouldn't be able. And given that you can (easily) from the persistent scopes, why not?
Anyway, as a final note, although the request scope has the same performance issue (you can probably guess why), the only thing that's really needed apparently is for the object to start out in the application scope. If you create a pointer to the object in the variables scope, it will still perform fine. So the following code will also be free of the performance issue:
<cfset variables.agent = application.agent />

This seems to correspond with the symptoms of a long running performance issue we have had within an application where certain methods of a variables scoped CFC are called up to 2,000 times.
We have had issues since CF7, which we had managed to code round - but since moving to CF8, the problems are much more severe (i.e. one execution thread causing an instance of CF to use > 1GB RAM causing a java.lang.outofmemory error) - and despite refereing this to Adobe support, the "best" recomendation we had was to increase the amount of memory in the server, and allocate it to the instance(s) of CF which were running this part of the application...
I will have to try moving the object to the application scope (which should not be too onerous as this execution path is already single threaded) and see if this makes a difference.
just sayin' is all.
plowing everything into the app scope or aggressively loading all of your cfc's upfront also seems to go against what Ike's been doing more and more in his projects by "lazy loading" methods when needed.
i try to use the app scope only for things i'll need throughout the application (hence, the name "application scope").
100 records = 0ms (0.00ms / record)
300 records = 0ms (0.00ms / record)
1000 records = 15ms (0.02ms / record)
3000 records = 16ms (0.01ms / record)
I have this set up as "speedtest2.cfm" and "agent.cfc". Are you doing the speed test loop from inside a CFC or a framework? I'm on CF 8. Running against cacheboxagent, I get:
100 records = 719ms (7.19ms / record)
300 records = 2,703ms (9.01ms / record)
1000 records = 9,812ms (9.81ms / record)
3000 records = 24,844ms (8.28ms / record)
100 records = 16ms (0.16ms / record)
300 records = 78ms (0.26ms / record)
1000 records = 312ms (0.31ms / record)
3000 records = 891ms (0.30ms / record)
running 8.0.1 on a mediocre machine with a ton of stuff running.
1st run
100 records = 42ms (0.42ms / record)
300 records = 77ms (0.26ms / record)
1000 records = 269ms (0.27ms / record)
3000 records = 775ms (0.26ms / record)
2nd run
100 records = 46ms (0.46ms / record)
300 records = 81ms (0.27ms / record)
1000 records = 279ms (0.28ms / record)
3000 records = 744ms (0.25ms / record)
3rd run
100 records = 22ms (0.22ms / record)
300 records = 84ms (0.28ms / record)
1000 records = 263ms (0.26ms / record)
3000 records = 753ms (0.25ms / record)
I'm not aware that I've swapped out the JVM -- the jre path in CF Admin is C:/ColdFusion8/runtime/jre. I'm using CF 8.0.1 on Win XP Pro SP2 - 1.2 ghz dual core Athlon - it's an old machine and memory is scarce with just under 1gb physical on the system. I have fiddled with the JVM settings for memory and garbage collection.
Here are the current JVM settings reported in the admin:
-server -Dsun.io.useCanonCaches=false -XX:MaxPermSize=64m -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Thanks to everyone for the help! :)
Server Details
Server Product ColdFusion
Version 8,0,1,195765
Edition Developer
Serial Number Developer
Operating System Windows XP
OS Version 5.1
Adobe Driver Version 3.6 (Build 0017)
JVM Details
Java Version 1.6.0_04
Java Vendor Sun Microsystems Inc.
Java Vendor URL http://java.sun.com/
Java Home C:\ColdFusion8\runtime\jre
@Marc - yes absolutely. This is not intended as an endorsement of generically putting all your code in the application scope. You should only do that if it makes sense to do that, and simply seeing that this behavior occurs in some cases isn't enough evidence that you really need to. If for no other reason than that moving anything to the application scope introduces new race conditions that need to be considered and handled. So unless you're running into an actual, measurable performance problem, you should leave your variables scoped CFCs where they are. Fortunately I think Dan has this covered as he mentioned that particular part of their app already being single threaded. :)
@Sean - thanks - I think it's geometric though... exponential growth is much more aggressive I think.
I am replicate this issue and we will try to fix it as soon as possibly. Also I have observed that this is happening if Memory tracking is ON. If you switch off Memory tracking performance should be as usual.