RA – suffering a crisis in confidence?

In future, talkRA will be inviting distinguished guests to write one-off blogs for the site. This first guest blog is by Mark Yelland, consultant and co-author of ‘Revenue Assurance for Service Providers‘.

I must be missing something. Is revenue assurance more safety critical than aerospace, or require higher reliability than a satellite? It must be, because both those industries have been using sampling strategies without problems for years and yet one never hears about sampling as an approach within RA. And I use these as examples because both have had high profile failures and yet neither has moved away from sampling as an approach.

So let us consider some of the benefits of a sampling approach for usage.

For usage based products, using recognised sampling plans, such as the USA military Standard 105E, for batches over 0.5m, the sample size would be just 1250 to achieve an acceptable quality level of better than 0.01%, or 100 parts per million, which is probably good enough for RA. But that is less than 1% sampling, surely that can’t be right? According to well established sampling theory, tried and tested over decades, it is. OK, you may not be comfortable that there is only a 95% confidence level that your sample will detect all errors above 0.01%, but you can always calculate the sample size required to deliver the confidence level necessary, but even at a 10,000 sample per batch level, it still represents a significant drop in volumes of data that need to be processed.

With less volume of data to be processed, the analysis speeds up and the visibility of issues is quicker, so the potential impact of any problem is certainly no more and potentially less than 100% sampling.

With less time performing data analysis, the analysts can focus on either non-system based revenue assurance, such as prevention, audit or training.

But what constitutes a batch? There are a number of ways to define a batch, one could argue that all calls from one switch in a day, or calls terminating within a timeband, or all TAP calls in a day, or all Wholesale calls in a day, and so on represent a single batch. Or you might argue that the process is continuous, in which case there are sampling plans for that – I wanted to keep the discussion simple. In all cases, the only requirement is to make a case for a definition of a batch that you are happy to justify.

How do you take a random sample from the batch? Again there are a number of different approaches, for example capturing every nth call that is terminated – because you have no control over the start time, duration, call type and destination and the traffic is representative of the distribution of the traffic on the switch or network, it is close enough to a random sample not to compromise the findings. Again, happy to discuss / respond to challenge on this.

So what are the downsides.

Some organisations utilise the RA system to help with Sarbanes-Oxley compliance. I am not an expert, but my expectation would be that a process which was capable of detecting errors at the 1 in 10,000 level would probably be considered a suitable tool given the level of errors in other parts of the business.

The data is used for Business Intelligence reporting so needs to be 100% complete. Business decisions are not based on whether an answer is 5.01 or 4.99, or a trend in up by 1.01% or 0.99%, they are based on more significant gaps, for example 5 or 1, or 1% not 3%, simply because the uncertainty about the external factors makes reporting to this level of detail pointless, the usual argument concerning the difference between accuracy and precision. The probability is that the sample will provide the accuracy required to make business decisions.

We want to see that all the records are being captured. RA is about balancing costs versus risks, if the increase in your operational cost exceeds the predicted value of calls missed through using sampling, then you are acting against the best interests of your business. And with the drive to lower the prices of calls, this equation moves further away from 100% sampling with time.

As yet I have not heard a convincing argument that sampling on usage is not valid, but am open to offers.

I would not elect to use sampling on non-usage data or standing data, there are few benefits to be gained primarily because the volumes involved are usually considerably less than usage data, and the rate of change is slower, so 100% reconciliation on a periodic basis works for me.

The real problem is that people are reluctant to accept sampling theory, they create individual scenarios to justify not using sampling without applying the check – how realistic is that scenario and what would the potential cost be. It is a confidence issue, have confidence that the mathematics that has been used for many years is as valid today as it always was and be prepared to defend your position.

And just to make life interesting – if you accept that sampling is the correct approach, then the argument about real time RA disappears, which is why you are unlikely to find vendors pushing sampling.

I am not anti-tools, I am strongly for tools that can be easily used and are affordable to the smaller player. Using tools in a smarter manner has to be the right approach.

Final note that using test call generators is not random sampling; it is using a control sample to monitor outcomes.

From time to time, Commsrisk invites special guests to make an expert contribution.