Challenges of Automated Testing (for telcos)

Recently I have been asked for an advice regarding automated testing. I spent several years in R&D designing our test framework (for SMSC, MMSC, RCS, IPSMGW, ..), sometimes I felt like trying to break a wall, sometimes it was very rewarding (at least for me it is rewarding to find a bug 🙂 ). There are some lessons learnt I’ll remember for a long time. I’m afraid this post won’t be a short reading.

Fukushima Nuclear Power Plant

It has to be useful from the beginning

When we start to use a new tool it requires people to change their way of doing things. The worst mistake is to force engineers to use something they don’t really need. They’ll try the tool once or twice. If they don’t gain a feeling it’s saving their time, they’ll stop to use it.

Firstly it means, it has to be simple to use. No one wants to spent too much time learning how to use any tool she hasn’t need so far. And she has to see, that it’s really useful.

When we started to introduce our test tool in test teams, there was always some resistance. Many managers want people to embrace changes, but reality it’s not always the case. And often for a good reason! So we always identified a procedure, which had to repeat often, people hated to do it manually and it was time consuming (repeat same test in various environments, installation procedure, upgrade procedure, failovers, configuration of environment for complex scenarios, etc.).

In the beginning we didn’t have more then 10 automated tests for a given product. But these 10 tests were saving a lot of time and effort. When engineers saw, it’s saving their time, they started to be more supportive and din’t mind to invest their time to learn how use the tool or write new test cases.
QA is not about finding bugs

Many times it happened to me that there was someone very important who objected automated testing. It is understandable, because the many Project Managers don’t care about long term results. Why should they invest in such a thing like automated testing, when all they need is to meet the deadlines/project goals? They can achieve it without any risk with their testers doing manual tests. Also they can claim, that when we automate tests, we’ll never discover the issues which are discovered manually. Right they are, but that doesn’t mean we don’t need automated tests.

Testing is not about searching for bugs. Honestly when you work for a few years with some complex product, it is not that difficult to find a bug. But more important is to verify, that all the crucial scenarios are working fine. Because there is always a bug, isn’t it? 🙂 The key thing is to make sure that it is not a serious bug. That’s why testing should be focused on verification of requirements we have on the system. Manual testing is – no doubt about it – useful. But there is no way we can verify thousands or more scenarios over a night (in several different environments) manually. That’s impossible, though we should do it with each new build, right?

Over ten years ago I worked in R&D on maintenance. There was a big CR (thousands of mandays), which introduced plenty of bugs. Actually the core of the system was crashing even when customers didn’t use the feature at all. Needles to say what it did to our reputation. I went through the test design, it had several hundred pages. Many tests focused on the new feature and some regression testing as well. The problem was that the tests were performed manually. And as the deadline was coming near, they were too busy with testing of fixed features that there was no time and resources to repeat regression tests.

The golden rule of testing says: Undertesting is a sin, overtesting is a crime.

That means, when we don’t test enough, we jeopardize the product quality and trust of our customers in our work. On the other hand for big products we are never able to cover all the configuration and scenarios simply because of the curse of dimensionality. Then it’s better to focus on important scenarios then to look for tiny bugs. Customer is usually ok if there is some minor issue in the new feature, but will never excuse if the new CR or bug fix corrupts an important functionality.

Finally another rule says: More tests you have in your regression suite, better confidence in a product quality you can have.

Note, it doesn’t say more bugs you will find, the better the product is. Guess why 🙂
Reliability is the key

Test tool is a product as any other. So it also contains bugs. Again we have to make sure that the tool is in a good shape and there is no serious bug. The simplest way how to achieve it is to run the regression suite every day or in case of continuous testing to execute a defined test set with every new commit. And never continue with the new development (new test cases, new libraries) until the results are not stable.

It might be painful to investigate why some test is time-to-time failing. It can be because of a bug in the product, bug in the tool, race conditions, it can be caused by some other tests, by environment etc. It may lead to redesign of the tool or its components. At the end of the day we should have a reliable test suite, we should mark the tests which are failing because of the product issues (desired behavior), we should remove the tests which are not 100 % reliable from the regression suite until we find the root cause and fix it. Always we have to be sure that any test or configuration change in the product environment doesn’t affect following tests. Meaning tests have to be independent one on each other. Even in case a test is failing.

I’d think this is the biggest technical obstacle. In our case we even had to introduce try { } construct in the programming language we used a few years before it was done in the language standard. If the test tool is not reliable, it is useless. In the company I worked there was originally an automated framework with thousands of call-flow scenarios. The problem was, that even if everything was fine, several hundred test cases failed. Who wants to work with such a tool? It is time consuming to investigate the real result and you can’t trust it completely anyway.

One more thing I’d like to mention here is a proper reporting. It has to be easy to see the results, understand the test scenarios (without the knowledge of the tool), in case of bugs to easily identify root causes, and to debug both, scenarios and the tool itself, and so on. If you really want to have an automated framework with more than a few hundreds tests, a good software design of the tool is a must. I’ve heard that 9 of 10 testing frameworks are thrown away because they started up as a bunch of scripts. Once again, a test tool is a product as any other.
Step-by-Step

Even when management of a company wants to invest in a new tool, there is a danger that someone will come with plenty of requirements, what all is necessary. I’ve seen more than once good ideas destroyed this way.

We know that at least in the beginning the tool should be simple to use, otherwise people will have a good reason why to avoid it. On the other hand our long term goal is to have the best test coverage possible. That implies some degree of complexity.

We have to make sure that in each step the tool is saving our time, that the test tool is still reliable, and that what we are not able to implement right now, will be possible in the future once we will have the right library/functionality in place. That doesn’t mean the tool have to be able to do everything immediately.

How to eat an elephant?

In the beginning the test coverage is low. To write a new test case probably takes more time than to perform it manually. Well, if you perform it once or twice. Always try to automate firstly the scenarios which are about to repeat during the testing. The same applies for the most important scenarios or smoke tests. There is no need to save the world in one day. But the important thing is that your regression suite is growing every day. With each new automated scenario we’re gaining higher confidence in the result. And to run the regression suite doesn’t cost much. The power of regression testing is that firstly you know that the most important things are working and secondly that you might discover issue no one thought about, and finally that you can run tests for free anytime you need it, again and again.

And this is really the key. When there was a new CR thanks to reviews, inspections, unit testing etc., there were mostly very few issues related to the new functionality. That’s frustrating for a tester, isn’t it? 🙂 I was always very happy when our regression tests discovered a bug in some completely different functionality. Developers were stunned – how could you find the bug here? We’d never think there can be any connection.. For example there was a tiny bug fix in diameter triggering. And our regression discovered a crash of SMSC core in case an SMS was exactly 127 septets long … tell me you can do find it by manual testing .. Btw. automated testing is more meticulous, after each test we can verify all counters, billing, snmp traps, check logs, traces, etc. In many cases the scenario worked fine but we found an issue from a log file.

Note, that an automated framework for functional testing is not usually good for load or performance testing. Not all the frameworks can be used for integration testing, end-2-end testing, multi-site scenarios etc. We should define what testing is our test tool designed for and not to try to have a tool which is able to do everything. To be able to do everything usually means not to be able to do anything properly.
Keep It Simple, Stupid

I have a friend who works as a software architect. What 4 developers are thinking through a few days or weeks he’s able to design and code over a night. The algorithm would be more efficient, impact on the system lower, simply a genial solution. The only problem is that the mentioned 4 developers wouldn’t be able to fully understand it, so they are not able to make full use of it, to extend the functionality or fix bugs.

In our world genial solutions are not always good and wanted. Simpler solution is the better. UNIX in the beginning wasn’t the most efficient OS. But it was easy to understand as it was written in C. Ethernet wasn’t too efficient technology either. And there are many similar examples. When we write a test tool, it has to be really simple, because we need to understand the underlying technology (e.g. VoLTE, IPSec, Diameter, Sigtran, ..), tested product, scenario, product/system configuration, transmitted data – the last thing we need is to have a complex and difficult to understand test framework.

Testers should be focused on testing, not on programming. On the other hand the tool shouldn’t limit what they test (typical example are the keyword-based frameworks). At least in the beginning it is useful to review the new test cases. That ensures every engineer is able to understand the tested scenario. The more straightforward the scenario and code is the easier is to identify a root cause of a possible issue.

Of course, this list is not complete. We could spend days discussing other aspects of automated testing. Please remember one thing. Technically we can achieve nearly anything, it’s the human factor and money which makes things to fail. Remember Deepwater Horizon oil spill or Fukushima nuclear accident? Sometimes we know what is technically wrong but there are other interests, our project is running out of time, resources, etc. As said we can test each new feature for a very long time, but practically it is impossible. Also some PMs don’t care much about the quality (maintenance goes on a different budget) and 5 – 10 % risk of failure can be gladly accepted. A good regression suite covers your back and if there will be some issue it shouldn’t be the fatal one.

Real Time Communication

4G/5G, VoLTE, RCS, IMS, SIP, WebRTC, IoT/M2M for engineers

Challenges of Automated Testing (for telcos)

It has to be useful from the beginning

QA is not about finding bugs

Reliability is the key

Step-by-Step

Keep It Simple, Stupid

2 thoughts on “Challenges of Automated Testing (for telcos)”

Leave a Reply Cancel reply