A couple days ago I posted an article where I showed a really strange problem with CFC performance. In short, when executing many method calls (hundreds or more) on a CFC in the variables scope of a page (or the request scope), the system would bog down, gradually taking longer and longer to execute the method. Place the same CFC in the application scope (or any shared scope) and BAM problem solved, what had taken a couple of minutes to execute now executes in a couple seconds! And really it should execute in a couple of seconds in the request scope as well.
What this indicates is that there was a painters algorithm somewhere in the ColdFusion server. Sean Corfield had initially reported it as a bug, but after tracking it down, it seems like a low priority (assuming Adobe even considers it a bug, which they may not and I wouldn't blame them). Like many of the things we work with, regardless of the initial frustration, the final solution turns out to be something rather simple.
But I'll outline the steps I took to debug it briefly before I get to the solution.
I had initially done some testing with an empty CFC which had a couple of methods but no logic, just to see what the performance of the straight method calls would be, to rule out the possibility of anything else being cause of my problem. And as strange as it sounds, the tick-counts confirmed that indeed, doing nothing other than calling the method in a loop was causing it to bog down. Thankfully, Glenn Crocker offered the solution to put it in the application scope. As soon as I did that, performance reverted to the expected linear growth line.
This was the point at which I decided to post the blog entry partly because it seemed so strange and I figured if other people didn't know about it that they might want to know about it, but also because I wanted to see what other people might have to say about the problem. It certainly could have just been something that others had discovered before and I was just late to the party. It doesn't seem like that's the case at this point although I could still be mistaken.
Aaron Longnion asked for some more details about my environment such as the operating system, CF version, etc. and Sean had also mentioned referencing the article in the Adobe bug report. So of course, eager to provide any information that might be helpful to folks in diagnosis I posted a variety of information about my environment in the subsequent comments. After being unable to think of any better solutions, I figured I would take a back up of my configuration and reinstall the ColdFusion server.
That did it, the problem was solved.
So at that point I wanted to find out if the problem was an issue with my configuration or if it was a corrupt CF8 installation (at least at best guess). So I copied in my JVM arguments to my clean installation and it seems that having reduced the memory (including the permanent generation) and having changed the garbage collection options hadn't been the cause. But now I had a stack of neo-*.xml config files which the server uses to store the various settings from the ColdFusion administrator. In fact, I had two sets - I had one set from before the reinstall and I had a new set representing the default options from a clean installation of the server. So I backed up the default files, then copied my config files from before the reinstall into the working directory C:\ColdFusion8\lib\, restarted the service and tried again. Bang! I'd reproduced the problem.
At this point I knew that my configuration was at fault, I just needed to figure out which configuration caused it. Which settings would produce this behavior? I knew that there had been changes to the neo-datasource.xml for adding my DSNs and I knew there had been changes to neo-chron.xml for scheduled tasks. Neither of those seemed particularly likely culprits. I thought perhaps it had something to do with the general settings or maybe request tuning (although I didn't remember changing anything in that page of the admin).
I decided to open my copy of WinMerge and compare the directories to see just which xml files had been modified before going any further. I skipped by the couple I knew about, looking for something unfamiliar and saw that the neo-drivers.xml had changed, but that turned out to be a red herring as it merely contained some minor changes in the latest full release of CF8.0.1 (confirmed by a version number in the file). And then I stumbled upon neo-monitoring.xml... D'oh!
A while ago I had been trying to use the new server monitor in CF8 to figure out something that had been troubling me (without much luck) and had then turned all the monitoring features off and told myself I would come back to it later. And then I had posted a comment to the cf-talk mailing list in which I had described part of my problem with the server monitor in CF8, namely that it won't allow me to drill down into my CFCs to see where my memory is really being taken up, so the high-level analysis of "there's a bunch of stuff in application.xxx" wasn't very helpful.
Charlie Arehart had sent me an email off-list I think with a URL, but had made a suggestion or two and I figured I'd give it another try, hoping that maybe I had just overlooked something before. That turned out to not be the case as best I can tell, I can still only see "application.xxx #kb" but not drill down into application.xxx.
Apparently however that's where the problem started.
It seems I made the mistake of shutting my notebook down without first turning off memory tracking (which requires monitoring). Several days had passed and it seems the things I'd been working on hadn't run up on this issue until I saw the geometric performance cost of a CFC in the variables scope a couple days ago. By the time I saw that, the server monitor was out of sight and out of mind and so it didn't occur to me to go check and see if it was still enabled.
Looking at the Server Monitor with the neo-monitoring.xml restored, at the top of the display you can see the two red buttons indicating that memory tracking is enabled. Here's a screen-grab for reference:
So for anyone who is concerned about performance of CFCs in the variables scope -- if you're having any problems in that regard, make sure you check your CF Server Monitor and make sure that the memory tracking is turned off. Also as Marc mentioned before, this is not a reason to simply start putting CFCs in the application scope. There are valid reasons for CFCs to live in the variables scope of the page (or the request scope), so you really should just consider the use case and not worry about this with regard to deciding where to place your CFCs.
So to put a bit of a positive spin on my embarassment... ;) there are a couple of interesting things that have come out of this at least for me.
One is that if you ever have a problem that you suspect might be in your CF configuration, you now know that you can test that by backing up your CF config (jvm arguments and then zip-up all the neo-*.xml files in the lib directory), reinstall CF and compare them to the defaults. You might for that matter even like to just take a zip of the xml files after any new installation. That way you always know you have a copy of the defaults to compare against. Hopefully most of your issues won't be that difficult to track down, but as a last resort, this seems to be pretty effective for isolating configuration problems.
Secondly for anyone who's been curious about what precisely the effect of enabling the server monitor's memory tracking might be, this should give you a little more insight. :)
The only final comment I'll make is that although this does indicate that there's a painter's algorithm going on in the request scope as I mentioned before, it doesn't necessarily need a fix. If the folks at Adobe think they can change the behavior easily within their budgets and timelines, then great! But I certainly wouldn't consider that a priority, since you can of course just turn off the memory tracking. Although it does explain why I was recently unable to run the plugin manager with memory tracking enabled. So changing the behavior of the server monitor will help in some specific cases, I just don't see it as being a high priority.