Solving Release Quality Issues

How we significantly reduced our production bugs with strong QA testing.

Published March 14, 2023

#agile-in-practice#git-workflow#testing
release-quality-issues

This is another addition to our ongoing Agile in Practice series. You can get up to speed on where we were by looking at where we began.

Issue

Background: The State of the Product

Nuve had developed an increasingly complicated product that linked code across 3 different codebases: Front End Interfaces, Back End Orchestration, and SAP Systems running on cloud infrastructure.

We had done a good job writing automated tests to check functionality within each codebase, which led to low bugs within front end code, back end code, and SAP systems.

Background: Our Git Workflow

We were using GitHub Flow, which essentially was:

  • Our main branch was regularly deployed to production
  • Our feature branches were always branched off of main then merged back into main
  • Automated tests were run prior to releasing main to production after feature branches were merged in

The Issue that Arose

While our automated tests would pass, signifying that our Back End, Front End, and SAP Systems were essentially bug free, we would sometimes see user facing bugs after release. That’s because, while the code within codebases were well tested and therefore bug free, the interactions between codebases were not always well tested.

For example: A user might do something that wasn’t allowed - something that should show a helpful error message. Our back-end codebase was sending an error message to the (separate) front end interface codebase. But the front end codebase didn’t know how to handle the message. So it wouldn’t show anything to the user, and the user would get stuck. They didn’t get any feedback that they had done something incorrectly, which led to frustration.

There isn’t really a way to write a unit test for this type of bug because the issue sits at the intersection between two codebases. It’s an “integration test” issue because it arises at the integration point between two systems.

Solution

Agile Brainstorming: Options for Solving

We considered a number of possible solutions

Possibility 1: Write Automated Integration Tests

Good idea, but in practice, we had seen automated integration tests become very time-consuming to build and maintain, so we filed that away as something we might do in the future.

Possibility 2: Test all Integrations in Your Isolated Development Environment

We decided to do this, because it’s a good practice to test your own code end to end. But it wasn’t quite enough to give us peace of mind because a) it put all responsibility on to one developer to think of every test case, and 2) it didn’t allow us to test complicated releases that could cause issues.

Possibility 3: Create a Production-Like Quality Assurance (QA) Environment

Basically, we would clone our production development environment (with the exception of sensitive production customer data), and we would release changes here prior to releasing them to production.

This was the idea that really met our needs. It enabled us to do the following:

  • Test our release to a production-like environment to ensure our production release would go smoothly
  • After release, all developers could test each others’ work prior to the production release - more eyes meant more opportunities to catch bugs
  • And it had additional benefits:
    • If you had never released to production before, this was a good place to practice
    • If you had to make changes to infrastructure (eg. web or database servers), you could test those changes before making them in production, reducing the chances of an infrastructure issue

Implementation: A New Git Workflow

This led to an interesting question: Should we just release our main git branch to QA and then later release it to production after testing on QA? Or should we do something else?

There were good arguments in both directions, but given how easy it is to merge one branch into another using Github, we settled on creating a new branch which lead to the following equivalency:

  • main branch = Production
  • develop branch = QA

In other words, everything that is on main should be well tested enough to release to production (and production releases should happen quickly after changes are merged into main).

Also, everything that is on develop should be well tested enough to release to QA (and QA releases should happen quickly after changes are merged into develop).

In summary, our git workflow became:

  • develop is branched off of main and a production release is merging develop back into main and deploying to production
  • feature branches should be branched off of develop and merged back into develop for release to QA

Simply put, code started to flow like this:

feature-branch (Dev environment) → develop (QA) → main (Production)

Implementation: When Do We Merge Features to QA?

We suggested the following readiness principle:

If you merge a feature into develop (i.e. release to QA) you should be very confident that it’s tested enough that you will be able to merge into main and release to production within 24 hours.

Essentially, it’s bad form to break QA for a long time because you didn’t test well enough in your own isolated system. That behavior blocks other people, and while they have their own isolated systems, they can only develop in isolation for so long. They are pushing to integrate their code into QA in prep for a production release.

Results

This change, arguably more than anything else, reduced the number of user-facing bugs in production releases. The bugs became obvious when we tested them in a near-identical-to-production environment.

The change also enforced a degree of discipline leading up to releases. We knew that QA had to be “stable” prior to starting a production release, so people could rally to make that happen if there were bugs on QA prior to a planned release. We also could delay a release (generally for less than 24 hours based on our readiness principle) if we knew QA wasn’t stable.

As a final comment, it might sound like this adds some overhead/maintenance costs to the development process. That is true, but we’d argue that it’s a “good” kind of maintenance cost. Because QA is so similar to production, arguably, every issue that arises on QA could have arisen in production instead if we didn’t see it in QA first! In our opinion, it’s better to have those issues arise in QA than in production and to get valuable practice fixing them.

Want to code like the top 1% of elite SAP developers?

Elevate your skills with exclusive insights, expert tips, and cutting-edge best practices. Don't miss out – become part of the elite SAP community today!

Related Posts

agile-where-we-started

Agile: Where We Started

Nuve started with a bare-bones agile development setup. This post walks through the basic elements and why each was included.

#agile-in-practice#agile-journey#rapid-development
isolated-development-environments

Why Isolated Development Environments?

How isolated development environments enabled fast, high quality development with good economics.

#agile-in-practice#devops
git-version-control

Git Version Control

How and why we started using Git at the very beginning of our journey with the tool.

#agile-in-practice#devops
automated-code-checks

Automated Code Checks

Some examples of automated code checks, and when it makes sense to implement them.

#agile-in-practice#automation#clean-code