Checklist Analisys: Doing a Code Review

Last time, I published a check list for doing code reviews that we use with my team. In this occasion, I'd like to go more into the details of each item in it and look at the rationale for its inclusion. This will be a long post, so bear with me. You can also skip ahead to any section or item that is of your particular interest.

Let's go.

Overall checks

Read the PR title and description, to understand scope

The first step should be to review the author's comments on the Pull Request's purpose, so that we can align expectations. In our case, the tools we have for this are the title and description included in the PR. We also include a link in it to the related story, so that we can look for further details on its acceptance criteria. Also, we should know if this PR is final for the story, or if it's an intermediate step towards that goal.

Verify that the list of modified files matches expectations

Once purpose and scope are clear, we should take a brief look to the list of modified files, to see if anything stands out as possibly being outside the scope of the current PR. If so, it would be good to take a look there first, in case a suggestion to split those changes into another PR is warranted.

Check tests

Read tests before looking at the implementation code

This is a point that I usually emphasize. I'm a big believer in the value of tests in software development, and I think that a well written test clearly conveys the intent and expectations on the code being tested. That's why I want to look at the tests first, because even if development was not done using TDD, the tests need to cover code functionality thoroughly. Reading the tests without knowing the implementation details would help uncover any possible holes in the testing logic.

Does every implementation file have an accompanying test?

Very important, as we want all our code to be tested, even though we don't aim for 100% code coverage, which is not feasible anyway. Every important path should have at least some kind of test ensuring functionality, and having implementation code with zero tests should be a red flag in any development project that uses automated testing.

Does every test have an assertion?

A common pitfall in many projects where the developers are unfamiliar with automated testing practices is aiming for code coverage instead of functional coverage. This is often exacerbated by management requirements of achieving a particular target threshold of code coverage to show they care about code quality, which results in developers aiming to simply exercise code instead of testing its behavior. Every test should have an assertion (and ideally, only one assertion).

Do the test descriptions match the assertions in the tests?

This is a clear follow-up on the previous check. Not only does every test need an assertion, but also that assertion needs to match what the test says it's testing so that if (when) it fails, we can be sure of the reasons for it failing.

Can the test descriptions be read in isolation while remaining clear?

When a test fails, we would usually get a red notice from the test runner including the test description we provided along with a stack trace. If reading the test description is enough to lead us in the right direction to find the incorrect code, it will save us a lot of time looking at the test context, running through stack traces and generally debugging our code. The clearer the test description is for pointing the expected behavior versus the actual behavior, the better.

Do tests have a clear narrative order?

A test suite (a collection of individual tests) needs to be both coherent and cohesive. Reading the test run for a passing test suite should read like a user's manual for the module under test, with each test building up on the functionality already verified by the preceding tests. While we can, and should, ocassionally ask the test runner to run our tests in a random order, to look for possible couplings between tests, running the tests sequentially should result in a clear description of our expectations for the module without the need to go back and forth to understand what should be happening.

Do tests cover all the expected functionality?

While aiming for 100% code coverage is not conductive to good tests, we should always be aiming for 100% functionality coverage. That means that we should ensure that every expected behavior is covered by our tests at least once. Especially regarding business rules.

Are tests covering every edge case you can think of?

Edge cases are where the bugs live. If your tests are not covering the different scenarios that happen around the edge cases of your data, you're leaving the door open for bugs that will come back to bite you. Try to ensure as many of these doors are closed before approving a merge to a stable branch and your life will be much easier in the future. Trust me on this.

Is the data used in the tests properly mimicking the structure of real data?

Many times, in order to keep tests simple and readable, a simplified version of the data is used, but sometimes the data is so simplified that it is no longer useful to catch errors. For example, IDs do not follow the pattern that real IDs use, or the data types do not fully match with the corresponding fields. Maybe the data in the tests is based on an outdated spec for request or response formats for an external service, and so on. Ensuring the test data reflects the real data that the application is going to see as much as possible will help catch errors before they reach a production environment.

WARNING Is there excessive mocking in a test?

This is the only "check" in the checklist that should never be checked. I'm not comfortable with the way it's worded, but haven't found a better alternative yet. That is the reason for the bold warning sign at the beginning. If you have a better choice, please let me know in the comments. Also, this is somewhat more subjective than the other checks in the list. I generally feel strongly against mocks, and I think a good rule of thumb is to not mock modules that are under your direct control and no other project uses, as those APIs are more prone to change as the design changes. But sometimes you need to mock internal modules in order to allow two different changes to the code to run in parallel, and as long as you have taken the time to establish the interface API clearly, this should be OK.

Check high-level implementation code

Every check in this section is intended to be part of a quick overview of the implementation code to uncover any obvious code smells. They are expected to be completed quickly before moving to a more in-depth analysis in the following section.

Are all variable names nouns?

To start with, every variable (and immutable variables, or constants) should be representing a concrete entity or concept of some sort, and as such, their names should be nouns. Their function should be to answer the question "What is this variable (or constant) storing?"

Are all function names verbs?

Likewise, the names for functions should denote the actions these perform, and therefore their names need to be verbs. The question they need to answer is: "What does this function do?"

Are names meaningful?

Besides beloging to the right category of words, the names of variables (constants) and functions need to be meaningful in order to aid the reader in understanding the code purpose. These names should usually come from the business language of the application, but can also refer to technical concepts relevant to the module where they exist.

Are names consistent with other uses in preexisting code?

Nothing is more frustrating to the programmer reading unfamiliar code than ambiguity in the terms used within. Having functions called getUserName, fetchRecord, loadArticle and readDescription all used to obtain particular pieces of data from a database is particularly problematic, as they could easily be confused with another function called readUserName used for obtaining the value of a user-provided value from some GUI. It would be best to reserve, for example, the "read" prefix for user-provided values and "load" for values read from the database, while using "get" for items obtained from configuration files. Using a single verb consistently for a particular type of operation helps a lot when introducing the code to a new team member.

Do module names match the abstraction they represent?

Much as we insisted on having the expectations of a test match its description, we want to have modules represent clear abstractions, and be able to understand references to them from other parts of the code without having to dig into the code's source code. Putting a name that creates the right expectations for the code is invaluable to keep confusion to a minimum.

Are abstractions at the right level?

Again, this follows naturally from the previous rules. It is expected from a high-level module that implements business rules to handle the abstractions related to those business rules, while keeping technical implementation details in modules that represent more technical concepts like, for instance, a tree, a record or a database connection.

Are patterns used in preexisting code respected?

Following the rule on consistency mentioned before for the use of names, we should also try to keep code patterns uniform. This includes not only code style preferences like where to put curly braces or how to indent a file, but also decisions on, for instance, whether to use callbacks or Promises/Futures to handle asynchronous processing flow, or how to organize modules within the source code structure.

Check implementation details

This section finally deals with the details form the implementation, and the main goal here is to ensure the quality of the implementation code. There are only a few items, though, but answering them should help uncover other hidden problems. Remember that one implicit question that should be at the forefront of the reviewer's mind at every stage of the code review process is: "Is this code likely to hide any bugs?"

Does the implementation match the expectations from reading the tests?

Since we read the tests to make sure they are complete and correctly reflect the required features, the first step in ensuring the implementation is correct is to verify that it matches what we expect from the test descriptions and implementation. Every test should serve as an example of use for the productive code, so any mismatches with the actual implementation should raise warnings.

Are all the code paths covered by the tests?

While 100% code coverage is not our goal, untested behavior is risky as it means we have, by definition, undefined behavior happening somewhere in our code. Sometimes it is difficult to answer this question with perfect confidence just by reading the code, but the code coverage report will help answer it for us, and there resides its usefulness.

Are functions short?

Short functions are easier to grok and name, since they tend to have less responsibilities (see next point). Also, when a function is short, it is easy to read in one go without scrolling, and there's less need to keep context in the reader's mind in order to grasp what's happening. Ideally, functions should have only a handful of lines and fit comfortably in a single screen of code or less.

Do functions have a single clear responsibility?

In order to keep functions short, they need to have a single responsibility. This also helps moving them out of a module while refactoring if they need to be shared to avoid duplication of code. Having a function with a single responsibility also helps in naming them, and a clear smell of a function that is doing too many things and needs to be broken down is the presence of a connector in its name (most notably: "and").

Is the exposed API minimal?

This refers to both modules, objects, functions or any other reusable block of code that is used. Having an object with lots of methods, a utility module with lots of functionalities or a function with lots of parameters is a clear sign that the code in question is likely doing too much, and should probably be split. Also, having a minimal surface API lowers the likelihood of it being used incorrectly, or in unexpected ways. It's best to hide the complexity behind a small API, which enables more aggressive refactors, as the clients of a module are less likely to be tightly coupled to their implementation details.

This concludes the checklist analysis, and while I tried to keep the comments for each item short, I realize there's a lot that I left unsaid. I am likely to cover the missing pieces, or go more into detail of particular parts in the future. Let me know if you want to have more information ina future post on any of these concepts.

Thanks for reading.