This is another addition to our ongoing Agile in Practice series. You can get up to speed on where we were by looking at [where we began](/blog/agile-where-we-started).

## Issue

**Background: The State of the Product**

Nuve had developed an increasingly complicated product that linked code across 3 different codebases: Front End Interfaces, Back End Orchestration, and SAP Systems running on cloud infrastructure.

We had done a good job writing automated tests to check functionality within each codebase, which led to low bugs within front end code, back end code, and SAP systems.

**Background: Our Git Workflow**

We were using GitHub Flow, which essentially was:

- Our `main` branch was regularly deployed to production
- Our `feature` branches were always branched off of `main` then merged back into main
- Automated tests were run prior to releasing `main` to production after `feature` branches were merged in

**The Issue that Arose**

While our automated tests would pass, signifying that our Back End, Front End, and SAP Systems were essentially bug free, we would sometimes see user facing bugs after release. That’s because, while the code within codebases were well tested and therefore bug free, the interactions *between* codebases were not always well tested.

For example: A user might do something that wasn’t allowed - something that should show a helpful error message. Our back-end codebase *was sending* an error message to the (separate) front end interface codebase. But the front end codebase didn’t know how to handle the message. So it wouldn’t show anything to the user, and the user would get stuck. They didn’t get any feedback that they had done something incorrectly, which led to frustration.

There isn’t really a way to write a unit test for this type of bug because the issue sits at the intersection between two codebases. It’s an “integration test” issue because it arises at the integration point between two systems.

## Solution

### Agile Brainstorming: Options for Solving

We considered a number of possible solutions

**Possibility 1: Write Automated Integration Tests**

Good idea, but in practice, we had seen automated integration tests become very time-consuming to build and maintain, so we filed that away as something we might do in the future.

**Possibility 2: Test all Integrations in Your Isolated Development Environment**

We decided to do this, because it’s a good practice to test your own code end to end. But it wasn’t quite enough to give us peace of mind because a) it put all responsibility on to one developer to think of every test case, and 2) it didn’t allow us to test complicated releases that could cause issues.

**Possibility 3: Create a Production-Like Quality Assurance (QA) Environment**

Basically, we would clone our production development environment (with the exception of sensitive production customer data), and we would release changes here prior to releasing them to production.

This was the idea that really met our needs. It enabled us to do the following:

- Test our release to a production-like environment to ensure our production release would go smoothly
- After release, all developers could test each others’ work prior to the production release - more eyes meant more opportunities to catch bugs
- And it had additional benefits:
    - If you had never released to production before, this was a good place to practice
    - If you had to make changes to infrastructure (eg. web or database servers), you could test those changes before making them in production, reducing the chances of an infrastructure issue

### Implementation: A New Git Workflow

This led to an interesting question: Should we just release our `main` git branch to QA and then later release it to production after testing on QA? Or should we do something else?

There were good arguments in both directions, but given how easy it is to merge one branch into another using Github, we settled on creating a new branch which lead to the following equivalency:

- `main` branch = Production
- `develop` branch = QA

In other words, everything that is on `main` should be well tested enough to release to production (and production releases should happen quickly after changes are merged into `main`).

Also, everything that is on `develop` should be well tested enough to release to QA (and QA releases should happen quickly after changes are merged into `develop`).

In summary, our git workflow became:

- `develop` is branched off of `main` and a production release is merging `develop` back into `main` and deploying to production
- `feature` branches should be branched off of `develop` and merged back into `develop` for release to QA

Simply put, code started to flow like this:


> `feature-branch` (Dev environment) → `develop` (QA) → `main` (Production)

### Implementation: When Do We Merge Features to QA?

We suggested the following readiness principle:


> If you merge a feature into `develop` (i.e. release to QA) you should be very confident that it’s tested enough that you will be able to merge into `main` and release to production within 24 hours.

Essentially, it’s bad form to break QA for a long time because you didn’t test well enough in your own isolated system. That behavior blocks other people, and while they have their own isolated systems, they can only develop in isolation for so long. They are pushing to integrate their code into QA in prep for a production release.

## Results

This change, arguably more than anything else, reduced the number of user-facing bugs in production releases. The bugs became obvious when we tested them in a near-identical-to-production environment.

The change also enforced a degree of discipline leading up to releases. We knew that QA had to be “stable” prior to starting a production release, so people could rally to make that happen if there were bugs on QA prior to a planned release. We also could delay a release (generally for less than 24 hours based on our readiness principle) if we knew QA wasn’t stable.

As a final comment, it might sound like this adds some overhead/maintenance costs to the development process. That is true, but we’d argue that it’s a “good” kind of maintenance cost. Because QA is so similar to production, arguably, every issue that arises on QA *could have arisen in production instead if we didn’t see it in QA first!* In our opinion, it’s better to have those issues arise in QA than in production and to get valuable practice fixing them.


How we significantly reduced our production bugs with strong QA testing.

A QA environment and git workflow improved our production releases, establishing confidence and minimizing risks and bugs.

Solving release quality issues

Related posts

Empowering releases

Peace of mind with hotfixes

Reducing bugs, again