TDD and complete test coverage where exponential test cases are needed

by maple_shaft   Last Updated April 18, 2018 16:05 PM

I am working on a list comparator to assist sorting an unordered list of search results per very specific requirements from our client. The requirements call for a ranked relevance algorithm with the following rules in order of importance:

  1. Exact match on name
  2. All words of search query in name or a synonym of the result
  3. Some words of search query in name or synonym of the result (% descending)
  4. All words of the search query in the description
  5. Some words of the search query in the description (% descending)
  6. Last modified date descending

The natural design choice for this comparator seemed to be a scored ranking based on powers of 2. The sum of lesser important rules can never be more than a positive match on a higher importance rule. This is achieved by the following score:

  1. 32
  2. 16
  3. 8 (Secondary tie-breaker score based on % descending)
  4. 4
  5. 2 (Secondary tie-breaker score based on % descending)
  6. 1

In the TDD spirit I decided to start with my unit tests first. To have a test case for each unique scenario would be at a minimum 63 unique test cases not considering additional test cases for secondary tie breaker logic on rules 3 and 5. This seems overbearing.

The actual tests will actually be less though. Based on the actual rules themselves certain rules ensure that lower rules will always be true (Eg. When 'All Search Query words appear in description' then rule 'Some Search Query words appear in description' will always be true). Still is the level of effort in writing out each of these test cases worth it? Is this the level of testing that is typically called for when talking about 100% test coverage in TDD? If not then what would be an acceptable alternative testing strategy?

Answers 6

Still is the level of effort in writing out each of these test cases worth it?

You'll need to define "worth it". The problem with this sort of scenario is that the tests will have a diminishing return on usefulness. Certainly the first test you write will be totally worth it. It can find obvious errors in the priority and even things like parsing errors when trying to break up the words.

The second test will be worth it because it covers a different path through the code, probably checking another priority relation.

The 63rd test will probably not be worth it because it's something you're 99.99% confident is covered by the logic of your code or another test.

Is this the level of testing that is typically called for when talking about 100% test coverage in TDD?

My understanding is 100% coverage means that all of the code paths are exercised. This does not mean you do all combinations of your rules, but all the different paths your code could go down (as you point out, some combinations can't exist in code). But since you're doing TDD, there is no "code" yet to check paths for. The letter of the process would say make all 63+.

Personally, I find 100% coverage to be a pipe dream. Beyond that, it's unpragmatic. Unit tests exists to serve you, not vice versa. As you make more tests, you get diminishing returns on the benefit (the likelihood that the test prevents a bug + the confidence that the code is correct). Depending on what your code does defines where on that sliding scale you stop making tests. If your code is running a nuclear reactor, then maybe all 63+ tests are worth it. If your code is organizing your music archive, then you could probably get away with a lot less.

January 02, 2014 14:00 PM

I'd argue that this is a perfect case for TDD.

You have a known set of criteria to test, with a logical breakdown of those cases. Assuming you are going to unit test them either now or later, it seems to make sense to take the known result and build around it, insuring you are, in fact, covering each of the rules independently.

Plus, you get to find out as you go if adding a new search rule breaks an existing rule. If you do these all at the end of coding, you presumably run a bigger risk of having to change one to fix one, which breaks another, which breaks another... And, you learn as you implement the rules whether your design is valid or needs tweaking.

Wonko the Sane
Wonko the Sane
January 02, 2014 14:02 PM

Consider writing a class that goes through a predefined list of conditions and multiplies a current score by 2 for every successful check.

This can be tested very easily, using just a couple of mocked tests.

Then you can write a class for each condition and there are only 2 tests for each case.

I'm not really understanding your use case, but hopefully this example will help.

public class ScoreBuilder
    private ISingleScorableCondition[] _conditions;
    public ScoreBuilder (ISingleScorableCondition[] conditions)
        _conditions = conditions;

    public int GetScore(string toBeScored)
        foreach (var condition in _conditions)
            if (_conditions.Test(toBeScored))
                // score this somehow

public class ExactMatchOnNameCondition : ISingleScorableCondition
    private IDataSource _dataSource;
    public ExactMatchOnNameCondition(IDataSource dataSource)
        _dataSource = dataSource;

    public bool Test(string toBeTested)
        return _dataSource.Contains(toBeTested);

// etc

You'll notice that your 2^conditions tests quickly comes down to 4+(2*conditions). 20 is much less overbearing than 64. And if you add another one later, you don't have to change ANY of the existing classes (open-closed principle), so you don't have to write 64 new tests, you just have to add another class with 2 new tests and inject that into your ScoreBuilder class.

January 02, 2014 14:10 PM

I am not a fan of strictly interpreting 100% test coverage as writing specs against every single method or testing every permutation of the code. Doing this fanatically tends to lead to a test-driven design of your classes that doesn't properly encapsulate business logic and yields tests/specs that are generally meaningless in terms of describing the business logic supported. Instead, I focus on structuring the tests much like the business rules themselves and strive to exercise every conditional branch of the code with tests with the explicit expectation that that tests are easily understood by the tester as generally use-cases would be and actually describes the business rules that were implemented.

With this idea in mind, I would exhaustively unit test the 6 ranking factors you listed in isolation to each other followed up with 2 or 3 integration style tests that ensure you're rolling up your results to expected overall ranking values. For example, case #1, Exact Match on Name, I would have at least two unit tests to test when its exact and when its not and that the two scenarios return expected score. If its case sensitive, then also a case to test "Exact Match" vs. "exact match" and possibly other input variations such as punctuation, extra spaces, etc. also returns expected scores.

Once I've worked through all the individual factors contributing to the ranking scores, I essentially assume these to be functioning correctly at the integration level and focus on ensuring their combined factors correctly contribute to the final expected ranking score.

Assuming cases #2/#3 and #4/#5 are generalized to same underlying methods, but passing different fields in, you only have to write one set of unit tests for the underlying methods and write simple additional unit tests to test the specific fields (title, name, description, etc.) and scoring at designated factoring, so this further reduces redundancy of your overall testing effort.

With this approach, the above described approach would probably yield 3 or 4 unit tests on case #1, perhaps 10 specs on some/all w/synonyms accounted for -- plus 4 specs on correct scoring of cases #2 - #5 and 2 to 3 specs on the final date ordered ranking, then 3 to 4 integration level tests that measure all 6 cases combined in likely ways (forget about obscure edge cases for now unless you clearly see a problem in your code that needs to be exercised to ensure that condition is handled) or ensure does not get violated/broken by later revisions. That yields about 25 or so specs to exercise 100% of the code written (even though you didn't directly call 100% of the methods written).

Michael Lang
Michael Lang
January 02, 2014 14:54 PM

Your question implies that TDD has something to do with "writing all test cases first". IMHO that's not "in the spirit of TDD", actually it's against it. Remember that TDD stands for "test driven development", so you need only those test cases which really "drive" your implementation, not more. And as long as your implementation is not designed in a way where the number of code blocks grows exponentially with each new requirement, you won't need an exponential number of test cases either. In your example, the TDD cycle probably will look like this:

  • start with the first requirement from your list : words with "Exact match on name" must get a higher score than everything else
  • now you write a first test case for this (for example: a word matching a given query) and implement the minimal amount of working code which makes that test pass
  • add a second test case for the first requirement (for example: a word not matching the query), and before adding a new test case, change your existing code until the 2nd test passes
  • depending on details of your implementation, feel free to add more test cases, for example, an empty query, an empty word etc (remember: TDD is a white box approach, you can make use of the fact that you know your implementation when you design your test cases).

Then, start with the 2nd requirement:

  • "All words of search query in name or a synonym of the result" must get a lower score than "Exact match on name", but a higher score than everything else.
  • now build test cases for this new requirement, just like above, one after another, and implement the next part of your code after each new test. Don't forget to refactor in between, your code as well as your test cases.

Here comes the catch: when you add test cases for requirement/category number "n", you will only have to add tests for making sure that the score of the category "n-1" is higher than the score for category "n". You will not have to add any test cases for every other combination of the categories 1,...,n-1, since the tests you have written before will make sure that the scores of that categories will still be in the correct order.

So this will give you a number of test cases which grows approximately linear with the number of requirements, not exponentially.

Doc Brown
Doc Brown
January 02, 2014 15:34 PM

I've never been a fan of 100% test coverage. In my experience, if something is simple enough to test with only one or two test cases, then it's simple enough to rarely fail. When it does fail it's usually due to architectural changes that would require test changes anyway.

That being said, for requirements like yours I always unit test thoroughly, even on personal projects where no one is making me, because those are the cases when unit testing saves you time and aggravation. The more unit tests required to test something, the more time unit tests will save.

That's because you can only hold so many things in your head at once. If you're trying to write code that works for 63 different combinations, it's often difficult to fix one combination without breaking another. You end up manually testing other combinations over and over. Manual testing is much slower, which makes you not want to rerun every possible combination every time you make a change. That makes you more likely to miss something, and more likely to waste time pursuing paths that don't work for all cases.

Aside from the time saved compared to manual testing, there is much less mental strain, which makes it easier to focus on the problem at hand without worrying about accidentally introducing regressions. That lets you work faster and longer without burnout. In my opinion, the mental health benefits alone are worth the cost of unit testing complex code, even if it didn't save you any time.

Karl Bielefeldt
Karl Bielefeldt
January 02, 2014 16:14 PM

Related Questions

How to test a data structure?

Updated January 11, 2017 09:02 AM


Updated February 18, 2017 10:05 AM

What does stubbing mean in programming?

Updated March 03, 2017 13:05 PM