Andy Crouch - Code, Technology & Obfuscation ...

Why A Professional Approach To Testing Is Critical To Your Product

Photo: Unsplash - Cookie the Pom

This week I read that the Texas Department of Public Safety had to apologise after sending an Amber Alert out featuring Chucky. An Amber Alert is a notification system used in the USA to alert the public about missing or abducted children. Amber Alerts are SMS’d to public cell phones, sent to radio and TV stations and posted to the internet. It is an application that is time sensitive with a high impact outcome.

The message sent by the DPS featured Chucky’s fictional friend Glen as the abducted child. The listed suspect was Chucky and they included a picture of him holding a knife. Oh, and they set his race as “Other: Doll”. You can see more in the tweet below.

In any other context or system, the humour of using these characters might have been seen as funny. Within the context of the Amber Alert system, people were rightly annoyed and dismayed. This is a notification system which in its first 17 years rescued over 600 missing children. So, how did the department responsible explain the Alert?

It was a test malfunction.

A Real Word Story - What Happened

These types of incidents are often reported. Usually with only the embarrassment of the situation as the result. That isn’t always the case. Over ten years ago I was working on an Incident Management Control application. The software allowed clients to manage information and resources at a time of crisis from one, central, system. We had various clients that spanned all kinds of industries. We also had a codebase that was updated and deployed with the frequency of CI/CD without the automated processes and tooling. A large subsystem of the application related to notifications. We could handle everything from pagers (not joking) through email, text to speech and SMS. At the point I joined, the code was never designed to disable or allow for testing of the notification subsystem. After each deployment, the database config had to be updated to disable the functionality (using MS SQL Management Studio no less) in the test environment.

I think you can see where this is going?

A QA we had working on the team was testing the system and ran a well-documented incident scenario involving two of our clients. Unfortunately, the technical lead had rolled out a set of changes to the test site and had not updated the notification settings.

At the point of which the notifications started to be sent the QA called and told me what had happened. I immediately got on to the CEO to damage control the situation but it was too later. Even in that small timeframe, both companies had convened their Incident teams. Those teams included the whole C level team on both sides. Key staff had been sent to the incident management centres. In case you are wondering, one of the clients was an airline.

A Real Word Story - How Did It Happen

There were a number of failures that led to this event happening.

The first was a process failure. The code had been acquired as part of a larger acquisition by the parent group. The original infrastructure for the project was a single Production environment. The source code control lived on a machine in the original developers living room. As the business grew rapidly around the software no one looked at the “how” things were being done. Manual deploys (even back then) were causing a time drain and the process had to be taught to new hires. It was multistep and allowed for a lot of human error.

The second was the failure to develop a codebase that allowed for testing. As a side note, there were no automated tests. That is a whole other post in itself. It indicated the approach of the original developer who would turn off options at the database level. There was no clear configuration layer in the application that made it easy for a human (or automation process) to control.

The third was not listening to the users. In this case, the user was the QA that had been assigned to the project. They had asked for a testing environment to be created which was done but which they had flagged some issues. The first was that it was hosted on the same domain as Production, mapped to a /test suffix. They felt that without due care it was too easy to end up in the wrong environment. The second was that the test system had no visual cue showing it was the test system. The third was they could not check or control application options from the UI.

Think about that for a minute. A user gave clear and concise feedback to address a serious issue and they were ignored. The outcome could have not only cost the company a considerable amount of money but brand and reputational damage as well.

Needless to say, the points raised by the QA were addressed and there was an overhaul of how the application was developed and deployed. This was a good thing but it shouldn’t have taken the failure of a test system to drive it.

How To Approach Testing For Your Project

There are some key takeaways from both my own experience and the “The Chucky Alert”. Keep in mind four main reasons why testing is key to software development:

  • Time And Cost Saving. The sooner you find issues, the cheaper it is to fix them, both financially and in resources.
  • Product Quality. Your application works. It provides the value that your customers expect and functions as advertised.
  • Security. Testing will prevent most known, documented, security issues and reduce vulnerabilities.
  • Customer Satisfaction. By ensuring your application works well, as specified, your customers are more engaged and satisfied.

When starting a new project, you need to consider how you are going to prioritise testing. Think about the use of Feature Flags to enable or disable functionality without the redeployment of an application. Make these accessible to both humans and automation tools. Think about how you design your integrations with various subsystems and APIs. Put these behind a Facade pattern and develop testing fake alternative integrations that prove a system is working. Instead of sending SMS messages to an API have a plugin that allows you to write them to a file.

While it may be tempting to bake in test user logic (I have seen this first hand) into your application, this is a bad idea. One application I worked on used a numeric ID for users. If the number started with a 9 it was a test user. It meant that the code was littered with tests for 9 as the first digit of the user id and then provided a different implementation. (I would have gone for the use of Generics as it was a .Net app but again, a whole other post).

What this all boils down to is that you should have a separate environment for testing your application. There should be no shared resources, databases or API access. With modern CI/CD automation, containers and very cheap resources on hosted platforms, there is no excuse not to.

One final thought, use realistic and professional test data. If everything fails and for some reason your users see something that is an actual test then don’t repeat the Chucky situation. Apologising for sending a test message is one thing but having to explain why you are using potentially offensive test data has a terrible impact on the way your users view your product and team.

If you have any thoughts or comments about how your team approaches testing then let me know via twitter or email.