Getting back into things... sorry for being so jumpy!

Hey guys!

I've posted several blogs here in the past few days... and in retrospect I realize that I've been pretty knee-jerk about some issues I've encountered as I've started updating code for CF9.

I guess let me back up a bit here... a lot of you know that in 2008 I was diagnosed with Asperger Syndrome around the time I released the CacheBox framework (which I'm flattered to see has been downloaded quite a bit and I'm happy folks are enjoying it). :) Then in 2009 I moved from Boston back to Dallas TX where I grew up. I wasn't real sure what I was going to do professionally, but I wanted to move back to be near my kids who I'd unfortunately not been around for because I'd been chasing jobs around the country since my ex and I split up in late 2000. After getting back, Tiffany and I realized that my kids were in a really bad situation. Their mother was developing Alzheimer's disease and so the kids care was getting progressively worse. I won't get into the litany of things that were going wrong here, but suffice to say that I had to start stepping up and taking over care of the kids. Child Protective Services was involved and it wasn't until February last year (2011) that I finally got custody of my kids (which thankfully is now permanent). So I've been pretty busy with all of that and I'm now officially on disability.

A few weeks ago I got a server that I could use and I discovered that I can actually host it out of our rent-house. So since I had a license for CF9 that I got just before we moved back, I started setting it up to work on this other project I wanted to build.

The thing is I haven't done any serious programming work in a few years now and so I'm jumping back into CF9 a bit late... and I guess I'm out of practice and this past few days as I've been debugging the upgrades I've found myself pretty jumpy. So I'm gonna try to be less knee-jerk about things from now on. ;P And maybe I can get back to being the cool-headed developer I used to be. :)

Thanks guys!

The onTap Framework is Ugly!

The onTap framework is ugly. Yep. You heard me, I said it. The onTap framework is ugly, and I don't mind admitting that...

As a programmer I spend a lot of time thinking about the "most elegant" way to solve a problem. Although in reality it's not really the most elegant way, it's just what I feel is the most elegant way that I can think of and that's currently available to me with my resources. It's always possible that someone else might have thought of a more elegant solution, or that there might be a more elegant solution that's simply not available to me, often due to financial constraints. The release of recent versions of ColdFusion and the addition of application-specific mappings actually resolved a number of ongoing issues I had personally with code I felt was "ugly" or "inelegant". And that's not the only time a ColdFusion upgrade has helped me to clean up something I had always struggled with. The addition of onMissingMethod made possible a long time dream for me of having a lazy-loading function library that could load utility functions on-demand. And don't think I'm being hyperbolic when I say "long time dream" -- I was trying to accomplish that with ColdFusion 5, immediately after CFSCRIPT was introduced and made custom functions possible in the first place. Yet with all the advancements to the core language, I still routinely struggle internally with this notion of "elegance".

Part of the problem is the way that people think. Scientists used to believe that humans followed a "path to action" like this: think -> do -> feel. So in this model, you would think about what you're going to do, do it and then afterward you would decide how you felt about that action. Was it good or bad? Should you do it again? This is a very logical way of handling the world, however, it turns out to be the opposite of the way we actually behave. Our actual paths to action (and this includes us programmers) looks like this: feel -> do -> think. This path is not rational, but it is very, very efficient, which is why our brains evolved this way. It's also the cause of what I've called opinion driven development (ODD). Andre Marquis explains how this works in this video here. In this model we have an emotional desire to do something like eat or play a game, we do it, and then afterward we rationalize that decision. Usually we create a "logical" explanation for our actions which is incorrect, because it assumes our actions were inspired by reason instead of our emotions. Even the belief that we behave rationally is inspired by our emotions -- it's uncomfortable to us to think that we might behave irrationally, regardless of how strong the empirical evidence is. And it's that discomfort, that very emotional discomfort, that makes it difficult for us to admit to irrational behavior. Although we can develop ways of thinking about things that allow us to entertain these ideas without that discomfort, specifically by developing a "growth mindset".

[More]

Software Engineering and the Learning Curve

Have you ever considered the age-old nature versus nurture debate? You should. What is it that makes a person a great software architect? Is it an innate talent that some guys have and some guys don't, determined by genes and the size of your brain? Or is it determined by a person's passion and a persistent effort to seek challenges and overcome obstacles?

I used to think that I wanted to work with the best. I say I used to, because I've come to realize that this idea has caused some problems for me in the past, so now I've changed my mind about it. If you know me at all, you're probably at least marginally familiar with my interesting job history. Yet in spite of these challenges including not having a degree, I'm still an Adobe Community Expert today. This itself is only true because I continue to push myself and in the past few years, more specifically because I've started pushing myself in an area unrelated to software: personal development and communication skills.

Joel Spolsky obviously wants to work with the best. Who wouldn't, right? He's talked about this both on his blog and in Inc Magazine. If you read his articles on the subject, there's a particular way he talks about his new hires and the way he courts them that used to make me think "wow". However after reading Carol Dweck's book Mindset: the new psychology of success, now I'm thinking "uh oh".

[More]

Opinion Driven Development (ODD)

Sean Corfield likes ColdFusion. He also likes regular expressions. Why is this important? Well, because Sean has said on various occasions that his first impressions of both of these technologies was rather different. Of regular expressions he said, "I always used to think regex was a bit of a hammer and that regex fans thought all problems were nails but as I've become more fluent with it, I've seen the light." Of ColdFusion he said, "I thought oh my god, what a horrible language, it's all full of tags." Yet today, Sean is an Adobe Community Expert for ColdFusion. Of course Sean's not the only one. We all learn from our experiences and our opinions about things do change over time. I was late to the party on dependency injection or inversion of control (IoC). So was Ray Camden. On the other hand I was very early on the scene with database abstraction, having started work on DataFaucet on CF5 before there were even CFCs available. At the time nobody really understood or liked the idea, but these days database abstractions are all over the place in ColdFusion, so obviously as a community we've grown to understand and like them.

[More]

Pay It Forward

What's your passion?

Software is one of my passions, but I'd also like to talk about another of my passions... one that's far more important than software.

Films that inspire thought aren't very common in Hollywood. Films that inspire action are rarer still. Today I was fortunate enough to see one such powerfully moving film: Pay It Forward.

[More]

BOLT!

There's been a lot of talk today about Adobe's release of the new Bolt IDE for ColdFusion.

hmmm... y'know, this movie's due out any day now...

Coincidence? ;)

Getting Rid of an Old Notebook?

A lot of folks have been talking lately about upgrading to the new MacBook Pro... or for that matter upgrading in general. And honestly I've been looking around at machines myself.

The only machine I have for development right now is an HP notebook with 1gb RAM (that was *with* an upgrade) and a 1ghz dual-core AMD Turion processor. It's a 64-bit processor although I was told by HP support that they wouldn't support me putting a 64-bit OS on it because there weren't drivers available for the hardware at the time. Anyway this machine cost me about $1600 new when I bought it a few years ago. Today I can go down the street to Staples and get any old generic, bottom-of-the-line HP notebook and for less than half what I spent on this machine it would be twice as fast and have 3x as much physical memory.

However right now I can't afford to upgrade. In lieu of that here's my pitch. If you're one of these guys who's currently upgrading and you've got and older notebook that you're not using anymore, you can contribute quite a lot to the continued development of the onTap framework and the DataFaucet ORM by donating your previous notebook.

My plan is to build a CLAM server (ColdFusion, Linux, Apache, MySQL). :) Then I'll disable the relevant server services on the notebook I have now where all my email and personal stuff is, and I'll do my primary development and testing on the new notebook with just those services on it.

Thanks. :)

MythBusters: Select * Redux

So a couple days ago I posted a blog about the myth that the wild card (*) in a SQL query degrades the performance of the query in comparison to a comparable query using explicit column declarations (for all the columns). As with many of the MythBusters TV episodes it seems I was a bit hasty to declare the myth busted. And as with the TV series I'm now going back and doing more testing. Will I eat crow? Find out in this thrilling new installment!

A fan writes:

In the extremely small test bed you created, you will not see any performance difference. But when working with tables with millions of rows, the performance increase gained by explicitly naming the columns versus using SELECT * can be very pronounced. I know this from personal experience, but that experience is only on MS-SQL 2K & 2K5. I do know on those platforms that if you use SELECT *, the DB engine will NOT be able to create an optimized, possibly pre-cached execution plan. On an extremely busy server, this can make a big difference in performance solely because it is waiting for CPU time to create the plan.

...

I have seen negative effects of SELECT * in as little as 100K rows. Using SELECT * also changes the way ODBC/JDBC drivers create the actual database calls, which can negatively impact performance as well.

If a DB Server is very active, and the main bottleneck is the CPU, then SELECT * is VERY detrimental to Query Performance. This is because the Execution Plan must be created on-the-fly for every single call of that query. That eats up CPU time, and it will also cause Lightweight SQL locks that can prevent other DB actions from happening until the locks are released.

And as I tried to illustrate in my post, there are things like MS-SQL Views that will flat-out BREAK if you use SELECT * and make ANY changes to the table columns or their "order" in the table design. Making this problem even worse, it does not even throw an error when this happens! MS-SQL will simply return 0 ROWS, with no error code, every time the View is called until it is manually recompiled by the user.

Okay, so lets review the myth. I was commenting on statements like one found on the SixSigns blog in which someone said "there is no excuse for using the *". The myth here is that irrespective of the width or volume of the table or the individual use case, the mere presence of a wild-card (*) will automatically degrade performance (as was stated in the previous quoted example) and is a big red warning flag indicating that the person who wrote the query is incompetent.

I'll reiterate here by saying that this myth is NOT that a query will return faster if it uses fewer columns, but that performance is automatically degaded when the same columns are returned using a wild card.

(I also didn't address the issue with views needing to be rebuilt, because that's not part of the myth. Moreover if you've worked with views SQL Server at all, you're well aware of this issue (one that Microsoft is in no hurry to fix even though they could) and hopefully have adopted a strategy of addressing it every time you alter a table. At several of the companies I've worked for this was done by maintaining a script with all the views in order of dependency and executing that script after any change to tables. If you have the proper permissions you can actually create a trigger on the sysobjects table that will recompile the views any time a column is added. It doesn't work so well if a column is dropped, but it's really poor design in SQL Server to begin with. And many people don't work with views at all, in which case this isn't an issue anyway, so it yet again is far from a reason for the wild-card to set off a big red flag.)

So following the previous reader comment, I removed most of the columns from my table, stripping it down to just four columns. I then duplicated all the data in my data set until I had 22 million records. Even with just this one very tiny table with two integers and two 35-character strings, this one database is now about 8 GIGABYTES in size. I then turned off all other applications (browsers, IM clients, etc.), disabled several non-essential Windows pocesses (including ColdFusion) and executed this query several times against SQL Server 2005 Express in SQL Studio:

select *
from MyTestTable
where col02 = 'UUID'

Using a notebook with 1gb RAM and a 1ghz dual-core processor, this query returned 2k records in an average of 3 minutes. I then replaced the wild card character with an explicit column list and executed the query several more times.

select col01, col02, col03, col04
from MyTestTable
where col02 = 'UUID'

Using the explicit column names the query executed also in an average of 3 minutes. It's interesting to note that the fastest time of 2:30 came from a query using the wild-card, while the slowest time of 5:00 came from a query using the explicit columns.

With 22 million records I ran out of drive space to create an index that would cover all four columns. I was curious to find out however in addition to the myth, what the performance difference would be between a 35-character varchar column and an integer column. I removed half of the records so I had about 12 million records and then created an index with all four columns. This time when I filtered using one of the varchar columns, the query returned in about 90 seconds. Then I filtered using one of the integer columns like this:

select *
from MyTestTable
where col01 = 4801

Without the index this query took the same 90 seconds as the varchar filter. When I used the integer column with the index however this query executed in less than a second, even with 11 million records in the table. However there is still no difference between the query with the wild-card and the query with explicit columns. They perform the same.

My inclination is to say based on these results that the SQL Server 2005 is expanding the columns in the table, caching the expanded list and then making the substitution before caching the execution plan. I say this because when I've restarted all the services and run the query with the wild-card, the first iteration of the query takes some time to complete and subsequent queries use the cached execution plan as expected. However when I switch between the wild-card and the query using the explicit columns, there is no initial load while it caches a new execution plan - it seems to use the same execution plan whether it's using the wild-card or explicit columns.

What remains is for me to test a "high traffic" situation.

I don't have a lot of equipment available at my disposal, so my test here is likely to be somewhat primitive. I am however using ColdFusion 8, which means that even with the developer edition on my notebook, I can use cfthread to simulate simultaneous requests. I'm limited to two additional threads per request, which means I can have at most three threads executing at a time (the main thread plus two additional threads).

I've created a page that loops over a list: 1,5,10,50. This is used to fetch that number of UUIDs from the table for a subsequent fetch operation. The page then loops over my choice of column lists with or without the wild-card and spawns two additional threads. In each thread (including the main thread), I then execute a cfhttp request to a second page which inserts a new record into the same table and performs a select statement to fetch records with the provided UUID's in the col02 column. It does this in a loop for ten iterations, just to make sure there's a good long overlap where all three threads are querying the database simultaneously with both inserts and selects. So as the data is being fetched, it's also being changed and the index is being updated.

I can tell you that throughout the duration of this test, both cores of my processor were pegged at 98-100% utilization except for very brief downward spikes. Checking the processes, SQL Server was consuming the majority of that CPU usage. So there should be no question that this is a "busy SQL Server".

When the page is asked to fetch records for only one UUID, it returns a little under two thousand records in the result and execution time for the query averages between 300 and 600. What remains is no movement on the performance between using the wild-card and using explicit columns. As the number of UUIDs passed to the querying page increase, the times to execute those queries also increase. With 5 IDs it's fetching about 9k records in about 1.5 seconds and the wild-card still has no effect. With 10 IDs it's fetching about 17k records in about 1.5-2.5 seconds and the wild-card still has no effect. Finally with 50 IDs it's fetching about 90k records in about 10 seconds and the performance difference? Still bupkiss.

To be honest I expected if there was going to be a performance difference it would be with the smaller queries. I had also run this test initially using the integer columns instead of the UUID columns as filters, which meant queries that executed much faster, but still failed to produce the desired result. And based on what I know about the caching of execution plans in SQL server, I would think actually that to run into a problem you would need to be executing a very wide variety of queries, which isn't happening here. But I think this pretty well proves that the mere number or size of the queries won't cause any wild-card problems related to the caching of execution plans... that is at least not unless you've already got much bigger problems to begin with.

The way to simulate problems with execution plan caching really would be to create a variety of random tables and then execute queries with random column lists and random selection criteria against them to produce the maximum number of possibly unique queries. The objective would be to create a system in which you had more than 10k (or whatever the limit is) unique SQL statements being executed. Note that this is different than the number of queries being executed. An individual query may be executed 4-billion times, it's still only ONE unique SQL statement (and using cfqueryparam tags will help to reduce the number of cached statements to improve performance). My guess based on what I've seen from these tests, is that in an environment in which you were executing enough unique SQL statements to cause SQL Server to be unable to handle caching its execution plans, the presence of wild-cards would be one of your smallest problems.

So is it plausible that a wild-card could cause a performance problem in lieu of selecting all the table's columns explicitly? Yes it's plausible... But that's not the myth.

What's the myth? The myth is that the mere presence of a wild-card character (*) in a select query is an automatic performance/scalability problem and that using a wild-card in production is a sign of an incompetent programmer.

That's not what we've seen however. What we've seen is that, with quite a bit of effort, I've been totally unable to reproduce a situation in which a wild-card performed any differently than explicitly declaring all columns. It seems that in a scenario in which all the columns are actually being used, creating a performance/scalability problem with a wild-card character is actually quite a challenge.

Myth BUSTED.

NOTE: New code for these tests can be downloaded below. These files should never be placed on a production server. I deliberately avoided using cfqueryparam tags to try and produce performance issues, which means these files are vulnerable to SQL injection. You've been warned! :)

Searching for Transfer

Am I the only one who's frustrated by trying to search Google for specific information about Transfer or Reactor? The names are short and snappy. They certainly do "pop". They also produce a freaking LANDSLIDE of false positives if you try and search for them. I find myself forced to add "ColdFusion" to the search terms to get any decent results. I imagine I would be similarly frustrated by "Hibernate" if I worked much with Java. The word "java" on the other hand is so rarely used to describe "coffee" that it's a non-issue. Go figure. :P

Sensitivity

I'd been contemplating writing this article since I woke up this morning. I wasn't sure if I was going to. What convinced me to go ahead and write it is because it's about a conversation I'd been having with a fellow colleague in the ColdFusion community and I believe they decided to stop talking to me because of my opinions on the subject. So although this isn't even remotely about ColdFusion I'm going to go ahead and open this up to broader discussion here since this is my blog and I'm interested in hearing others' opinions.

This started because I was pretty disgusted and disturbed (although not particularly surprised) by the recent decision of the Bush administration to deploy military troops within the US to supplement civilian police. It's been my impression since I was in my teens, that it's generally considered one of the singular most important events in world history (to say nothing of US history) that the US was created as a representative democracy with the intention of deliberately limiting the power of government. This is essentially the exact opposite of that. This is the short road to a police state.

[More]

Domain Scalpers

I hope every single one of them dies lonely, friendless and in extreme and prolonged pain.

A while back I'd purchased a relatively simple domain that nobody wanted... How do I know nobody wanted it? Because nobody had registered it. It cost me about $20. Great! And I set up a little site that never got very much traffic (as far as I know)... just a handful of computer programmers checking out some tools. Websites for engineers never see huge amounts of traffic, that's just the nature of the beast...

And then after a while I went through a period of depression and I let the domain lapse... and a scalper bought it... I discovered this last week actually... And I'm absolutely certain they paid the same $20 for it that I did originally... They replaced my small programming site with the typical domain scalper's one-page "this is supposed to look like a legit search engine site while actually being a scummy attempt to generate revenue on advertising while I wait for the original domain owner to pay me my exorbitant extortion rates" site.

But I was looking at the Network Solutions site to see what the options might be for making an offer to buy it back, since there wasn't any contact information on their pretend site. Apparently it costs $20 to make the offer ... and then the offer (separate) has to be at least $100... Okay... unpleasant, but still within the realm of reality...

And then I noticed this:

Hmm... Y'know, I really don't know what kind of offer they're going to expect...

Apparently they're not expecting that a human will assess the value of that domain.

More Entries

BlogCFC was created by Raymond Camden. This blog is running version 5.5.006. | Protected by Akismet | Blog with WordPress