Wierd Performance Issue Resolved!

A couple days ago I posted an article where I showed a really strange problem with CFC performance. In short, when executing many method calls (hundreds or more) on a CFC in the variables scope of a page (or the request scope), the system would bog down, gradually taking longer and longer to execute the method. Place the same CFC in the application scope (or any shared scope) and BAM problem solved, what had taken a couple of minutes to execute now executes in a couple seconds! And really it should execute in a couple of seconds in the request scope as well.

What this indicates is that there was a painters algorithm somewhere in the ColdFusion server. Sean Corfield had initially reported it as a bug, but after tracking it down, it seems like a low priority (assuming Adobe even considers it a bug, which they may not and I wouldn't blame them). Like many of the things we work with, regardless of the initial frustration, the final solution turns out to be something rather simple.

But I'll outline the steps I took to debug it briefly before I get to the solution.

I had initially done some testing with an empty CFC which had a couple of methods but no logic, just to see what the performance of the straight method calls would be, to rule out the possibility of anything else being cause of my problem. And as strange as it sounds, the tick-counts confirmed that indeed, doing nothing other than calling the method in a loop was causing it to bog down. Thankfully, Glenn Crocker offered the solution to put it in the application scope. As soon as I did that, performance reverted to the expected linear growth line.

This was the point at which I decided to post the blog entry partly because it seemed so strange and I figured if other people didn't know about it that they might want to know about it, but also because I wanted to see what other people might have to say about the problem. It certainly could have just been something that others had discovered before and I was just late to the party. It doesn't seem like that's the case at this point although I could still be mistaken.

Aaron Longnion asked for some more details about my environment such as the operating system, CF version, etc. and Sean had also mentioned referencing the article in the Adobe bug report. So of course, eager to provide any information that might be helpful to folks in diagnosis I posted a variety of information about my environment in the subsequent comments. After being unable to think of any better solutions, I figured I would take a back up of my configuration and reinstall the ColdFusion server.

That did it, the problem was solved.

So at that point I wanted to find out if the problem was an issue with my configuration or if it was a corrupt CF8 installation (at least at best guess). So I copied in my JVM arguments to my clean installation and it seems that having reduced the memory (including the permanent generation) and having changed the garbage collection options hadn't been the cause. But now I had a stack of neo-*.xml config files which the server uses to store the various settings from the ColdFusion administrator. In fact, I had two sets - I had one set from before the reinstall and I had a new set representing the default options from a clean installation of the server. So I backed up the default files, then copied my config files from before the reinstall into the working directory C:\ColdFusion8\lib\, restarted the service and tried again. Bang! I'd reproduced the problem.

At this point I knew that my configuration was at fault, I just needed to figure out which configuration caused it. Which settings would produce this behavior? I knew that there had been changes to the neo-datasource.xml for adding my DSNs and I knew there had been changes to neo-chron.xml for scheduled tasks. Neither of those seemed particularly likely culprits. I thought perhaps it had something to do with the general settings or maybe request tuning (although I didn't remember changing anything in that page of the admin).

I decided to open my copy of WinMerge and compare the directories to see just which xml files had been modified before going any further. I skipped by the couple I knew about, looking for something unfamiliar and saw that the neo-drivers.xml had changed, but that turned out to be a red herring as it merely contained some minor changes in the latest full release of CF8.0.1 (confirmed by a version number in the file). And then I stumbled upon neo-monitoring.xml... D'oh!

A while ago I had been trying to use the new server monitor in CF8 to figure out something that had been troubling me (without much luck) and had then turned all the monitoring features off and told myself I would come back to it later. And then I had posted a comment to the cf-talk mailing list in which I had described part of my problem with the server monitor in CF8, namely that it won't allow me to drill down into my CFCs to see where my memory is really being taken up, so the high-level analysis of "there's a bunch of stuff in application.xxx" wasn't very helpful.

Charlie Arehart had sent me an email off-list I think with a URL, but had made a suggestion or two and I figured I'd give it another try, hoping that maybe I had just overlooked something before. That turned out to not be the case as best I can tell, I can still only see "application.xxx #kb" but not drill down into application.xxx.

Apparently however that's where the problem started.

It seems I made the mistake of shutting my notebook down without first turning off memory tracking (which requires monitoring). Several days had passed and it seems the things I'd been working on hadn't run up on this issue until I saw the geometric performance cost of a CFC in the variables scope a couple days ago. By the time I saw that, the server monitor was out of sight and out of mind and so it didn't occur to me to go check and see if it was still enabled.

Looking at the Server Monitor with the neo-monitoring.xml restored, at the top of the display you can see the two red buttons indicating that memory tracking is enabled. Here's a screen-grab for reference:

So for anyone who is concerned about performance of CFCs in the variables scope -- if you're having any problems in that regard, make sure you check your CF Server Monitor and make sure that the memory tracking is turned off. Also as Marc mentioned before, this is not a reason to simply start putting CFCs in the application scope. There are valid reasons for CFCs to live in the variables scope of the page (or the request scope), so you really should just consider the use case and not worry about this with regard to deciding where to place your CFCs.

So to put a bit of a positive spin on my embarassment... ;) there are a couple of interesting things that have come out of this at least for me.

One is that if you ever have a problem that you suspect might be in your CF configuration, you now know that you can test that by backing up your CF config (jvm arguments and then zip-up all the neo-*.xml files in the lib directory), reinstall CF and compare them to the defaults. You might for that matter even like to just take a zip of the xml files after any new installation. That way you always know you have a copy of the defaults to compare against. Hopefully most of your issues won't be that difficult to track down, but as a last resort, this seems to be pretty effective for isolating configuration problems.

Secondly for anyone who's been curious about what precisely the effect of enabling the server monitor's memory tracking might be, this should give you a little more insight. :)

The only final comment I'll make is that although this does indicate that there's a painter's algorithm going on in the request scope as I mentioned before, it doesn't necessarily need a fix. If the folks at Adobe think they can change the behavior easily within their budgets and timelines, then great! But I certainly wouldn't consider that a priority, since you can of course just turn off the memory tracking. Although it does explain why I was recently unable to run the plugin manager with memory tracking enabled. So changing the behavior of the server monitor will help in some specific cases, I just don't see it as being a high priority.

Related Blog Entries

Comments
marc esher's Gravatar First!

for real though, your persistence in figuring this out is admirable. I especially appreciate your walk through of how you debugged this. I don't know whether this laborious, persistent, methodical approach can be taught, or whether it's tacit, but at any rate, I'm glad you showed the world how you tracked it down.

excellent.

so if there's any specific takeaway, maybe part of it is that "if you see a geometric performance degradation, check your server settings first. debug off? check. memory monitoring off? check."?

I remember a while back a colleague and I were trying to debug a memory issue and we suspected a Schrodinger's cat scenario. first step was to turn memory monitoring off and our problem went away. if i remember correctly, this was shortly after reading the docs, or possibly charlie's posts, or seeing a mike brunt presentation (i can't remember... too many damn smart people saying obvious things), suggesting to check memory monitoring as a culprit.

thanks a lot for putting this out there.
# Posted By marc esher | 12/10/08 11:28 PM
Steve Withington's Gravatar congrats on tracking down the problem and thanks for sharing the details!
# Posted By Steve Withington | 12/11/08 12:11 AM
Sean Corfield's Gravatar If it's any consolation, I once turned on memory tracking on a production instance - intending to take just a few seconds of data - and when performance tanked (as expected), I shut the instance down. After that, the instance would never come back up cleanly. It would start up and the first request would cause it to tank. It took days before we realized what had happened: the memory tracking was still active! We took that server out of the load balancer, started it with no traffic and went in and switched off the memory tracking.
# Posted By Sean Corfield | 12/11/08 12:31 AM
ike's Gravatar Thanks guys! :)

@Sean - Aha! So I guess too, another good thing out of this is that if anyone else has that same issue with the instance not starting up again properly, that you can fix it without pulling it from the load-balancer by editing the neo-monitoring.xml. It is a wddx packet, but the variable names are pretty obvious - I had no problem at all identifying what they were for. Thanks for sharing! Hopefully this will be useful for anyone else who needs to get those few seconds of analysis from a load-balanced environment.
# Posted By ike | 12/11/08 1:09 AM
Glyn Jackson's Gravatar thanks for the write up. I did not know I had a problem until reading this, I have already starting moving all my CFCs to the application scope as a matter of course but one of our cf apps gets slower as the day goes on, well i am sure thats not the case now.
# Posted By Glyn Jackson | 12/22/08 4:34 PM
Sean Corfield's Gravatar @Glyn, "I have already starting moving all my CFCs to the application scope as a matter of course" - you *did* read this article, yes? Isaac said that the issue was *not* about putting CFCs in application scope - it was about the server monitor being accidentally switched on.

If you put CFCs in application scope, you'll run into thread safety issues unless you are extremely careful and VAR-declare all your function local variables (and you need to remember that the variables scope of a CFC that is in application scope is effectively the same as application scope itself).

Only singleton CFCs belong in application scope. Domain objects should not be placed in application scope.
# Posted By Sean Corfield | 12/24/08 1:32 AM
ike's Gravatar I generally agree with Sean here too. Moving the object to the application scope was really a matter of analyzing the performance issue, which was determined to be memory tracking in the server monitor. It shouldn't affect how you architect your applications.
# Posted By ike | 12/24/08 7:25 PM
Glyn Jackson's Gravatar I was not putting my CFC in the application scope because of this post but simply because using the application scope to cache components saves me from creating an new instance with each request.... well that's my understanding please correct me if I am wrong. I read your post "CFCs, Scopes & Thread Safety" a while back when I was having some problems with CFLOGIN sharing the same session across different users, at the time dont think it was the problem i was having it was more to do with me not making use of cflock exclusive. does the same problem still exist with sessions inside CFC's when stored in application scope? never really had an issue but you have me worried now

my CFCs in the app scope are all SQL and security functions which should not stuffer from this and if they do would not matter much, however I do have a cart CFC which I will revisit in light of this thanks.

is CFCs, Scopes & Thread Safety documented more anywhere else, it's very important something which I have overlooked and forgot about until your comment/post, lots of blogs recommend using the app scope for CFC catching but don't explain the best implementation for this.
# Posted By Glyn Jackson | 12/29/08 7:40 AM
Glyn Jackson's Gravatar very sorry to hijack the post, please delete this comment if it does not benefit anyone else.
you guys have me worried now.... looking at my cart CFC I seem to have used cfset = 0 or null to 'declare' my local variables which means from my understanding I should have no issue right? I dont use any scopes like cf. however I pass in one argument called proID but this should not be a problem as it not a local variable but passed in.
# Posted By Glyn Jackson | 12/29/08 7:56 AM
Glyn Jackson's Gravatar O my I have found more, forget my comment on
"my CFCs in the app scope are all SQL and security functions which should not stuffer from this and if they do would not matter much"

because it does! I use cfcatch a lot are you saying that i need <cfset var cfcatch = 0 />? I will stop post now, sorry Isaac I dont mean to deface your blog but the shock I could have 10 sites with this issue has be worried now!
# Posted By Glyn Jackson | 12/29/08 8:08 AM
Sean Corfield's Gravatar Glyn, if you have cfquery name="foo" in your CFCs and you have not var-declared foo, you have a thread safety issue. You'll definitely get better responses if you join/post to cfcdev tho'...
# Posted By Sean Corfield | 12/29/08 12:54 PM
ike's Gravatar @Glyn - I don't see anything wrong with your comments staying here. The conversation is important to help people understand what's going on, and if it drifts off onto related subjects like var scoping, well, that's just the nature of conversation. :) Although folks might be willing to review the code for you if you want. Requests for code review are occasionally posted on the cf-talk list and someone's usually happy to help.
# Posted By ike | 12/29/08 4:19 PM
Glyn Jackson's Gravatar Thanks, I have posted a comment on CF-Talk list (first time). Keep up the good work Isaac. I really like reading your posts its like I can see the process you work through in the way you write, long winded but helps me understand a the message better.
# Posted By Glyn Jackson | 12/30/08 6:27 AM
ike's Gravatar Thanks Glyn, I'm glad you find them useful. :)
# Posted By ike | 12/31/08 1:46 AM
Dipak's Gravatar @ike,
Do you think, the following line in your code should be outside of the loop?
<cfset agent = CreateObject("component","agent").init() />

Will it have an extra processing/overhead if it is inside the loop? Though in my testing I do not see the performance difference, but still I would prefer to put it outside of the loop. Suggestions?
# Posted By Dipak | 1/6/09 12:08 PM
ike's Gravatar @Dipak - I went ahead and moved it in the code example in the previous page. It actually would not effect the test results regardless, even if creating the component took a solid minute (which it never would, I'm just using that as an exaggerated example). The test code gives you times for executing 4 different series of method calls, so within each iteration of the outer loop it then sets the start-point for timing the inner loop after having created the CFC. So since the CreateObject() call occurs before the start-time is set, it's never included in the result times. It is very slightly more efficient to run the test page with the CreateObject() outside the outer loop, but only by about 5ms or so.
# Posted By ike | 1/7/09 7:24 AM
BlogCFC was created by Raymond Camden. This blog is running version 5.5.006. | Protected by Akismet | Blog with WordPress