Monday, December 03, 2007

What does automated testing miss?

A friend of mine once said that you can judge the intelligence of a driver of a car by the number of stickers plastered to the back of it. He asserted that the relationship is inversely proportional, which I found to be insanely hilarious at the time, and I still think it is a clever turn of phrase even though essentially it is an unprovable generalization. The point is that saying "support world peace" fifty different ways doesn't make you a better person, and it often has the opposite effect of betraying a lack of aesthetic and poise.

Joel on Software gave a talk at the Yale Computer Science department recently wherein he made what I perceived to be a similar assertion about the relationship between the attention given to certain classes of bugs and the ability to test for them automatically:

And so one result of the new emphasis on automated testing was that the Vista release of Windows was extremely inconsistent and unpolished. Lots of obvious problems got through in the final product… none of which was a “bug” by the definition of the automated scripts, but every one of which contributed to the general feeling that Vista was a downgrade from XP. The geeky definition of quality won out over the suit’s definition; I’m sure the automated scripts for Windows Vista are running at 100% success right now at Microsoft, but it doesn’t help when just about every tech reviewer is advising people to stick with XP for as long as humanly possible.


Whether that is true in the case of Windows Vista is open for debate, but the notion that 100% successful unit tests may still leave bugs and internal consistencies is one that has been on my mind a lot lately. Joel makes the point elsewhere in his talk that writing good code depends on writing good specs, and writing good specs is just as hard as writing good code. Writing good tests is equally as hard, but for some reason I think there is a belief out there that as long as you have lots of them, successful unit tests make your code good.

If something is difficult to unit test, such as stored procedures and user interface, a couple of things happen: First, a subtle prejudice develops against those areas (if they can't be or just aren't tested, they must not be very important). Second, the wrong tool for the job gets used -- SQL finds its way into code instead of living in stored procedures, UI issues are deemed trivial so long as the feature being implemented at least works, etc.

It would be a mistake to argue that unit testing should not be done for these reasons, but there needs to be a greater awareness of the relative importance of other types of testing as well. We should take care not to think we have done a complete job of testing just because we can point to a huge stack of tests that only test one area of an application. Perhaps it would be wise to spend a little less time copying and pasting the same test multiple times and changing one piece of it and a little more time making a comprehensive test plan.

2 comments:

Anonymous said...

I have seen products with tens of thousands of tests that take days of CPU time to run and yet still don't test all of the major features or even test the GUI at all. There is definitely a tendency to value quantity over quality with testing. I think that tendency is driven by the non-technical team members, who don't really know whether or not a test is good or bad, and accelerated by the technical team members, who love quantifiable things.

The problem with defining any numeric goal is that it sets up a reward system, e.g., 100% of the tests pass rewards having no test failures. Humans in general are very good at adapting their behavior to the reward system and geeks in specific are really good at gaming the system. (Cf. http://www.joeindie.com/images/dilbert-minivan.gif)

Anonymous said...

I totally agree. Unit tests are useful for testing paths through your code to make sure you have exercised them, but that doesn't mean that the code does what it's supposed to, it just means you've tried all the ways it doesn't do what it's supposed to. Plus, if the test requires 10 pages of setup to test one case, then testing boundary conditions becomes prohibitively expensive in terms of developer mindshare, if not in terms of test-running time.
All unit tests prove is that the code in question does what the author wanted it to in every case, not that it does so correctly. For that you need integration tests, and functional tests running at the UI level. More effort spent on functional tests like Watir tests will catch bugs that user's care about.