Crawler: Spinning the assurance web

You may not know Shahid Ishtiaq’s name, but if you intend to have a career in business assurance, you will inevitably come to know his work. A specialist in telecoms revenue assurance and fraud management, and an innovator by nature, Shahid has already had a profound influence over the output from the TM Forum’s RA team. Now Shahid has turned his attention to ‘CRAWLER’, a new project that has revolutionized the productivity of business assurance analysts working for Etisalat. Shahid has kindly agreed to explain CRAWLER in a special two-part series that was written exclusively for talkRA. This week Shahid describes the research findings that prompted the development of CRAWLER. Next Monday, Shahid will explain how CRAWLER was developed and the benefits it has delivered in Etisalat.

In today’s competitive telecommunication market, companies are finding it difficult to keep up with the pace of change. New and complex products, changes in technologies and the collaborative environment have influenced companies to invest more and more on revenue assurance and fraud systems. The primary objective of investing in these systems is to achieve accurate and timely customer billing and to provide proper shares to all the partners in the business cycle. The second objective is to maximize the cash influx into the business.

If we look at history, the revenue assurance function started with in-house activities and was mainly focused on the improvement of data quality and completeness of data. Similarly, in the case of fraud management, a few types of fraud reports were designed and they usually ran successfully in an in-house system. Later, people started feeling the need of having dedicated revenue assurance and fraud systems (RAFM), as they were unable to keep up with the pace of change in the industry. Because of the limits on in-house expertise, purchasing commercial off-the-shelf (COTS) solutions/systems became the norm.

Today, most companies have either one or more COTS RA and FM systems. These systems have addressed the majority of RAFM issues and have made life easier for operators. As a result of comprehensive research, these systems have become very mature and are slowly covering all the complex grey areas in the companies. However, after using and learning from these systems, experts have realized that the root causes of RAFM issues often remain unanswered, and these systems are only helping us in detecting and fixing the symptoms of anomalies. Research is in progress and experts are developing new strategies every day to better answer the question of how to find the root cause of leakages.


In part one of this article we will focus on the quality and performance of teams that are using RAFM systems to perform their duties. The industry seems quite comfortable and satisfied with the detective RAFM systems. Despite comprehensive efforts to transform these functions and make them more preventive, it is really surprising that the need for detective RAFM systems is still increasing. Companies that are not satisfied with their current system seem to be planning to buy new systems from some other vendor. A few companies are changing their system because they feel that they have learned a lot from their mistakes when implementing the existing system and want to achieve a better level with some other vendor. Also, a few companies are changing systems because their group head office has decided to implement the same system in all subsidiaries.

These days, vendors are selling the licenses at a low cost and are earning through the support agreements, change requests and hardware. This methodology is somehow very common when it comes to COTS systems.


Recently we did research on the teams that were using the output of RAFM systems. The ultimate objective of this research activity was to review the efforts put in by the analysts despite having a system. Our aim was to reduce the effort required from analysts and to increase the performance and quality of work. Let’s have a quick look at a few steps performed by these systems before the analyst is engaged. Normally an alarm is generated when a desired level of threshold is achieved. Once the alarm is generated it is assigned to the analyst for investigation. Sometimes, in order to manage the alarms, a proper case management methodology is also used.

Normally, in case of fraud management, once the technical alarm is assigned to the analyst, he/she then accesses the latest information from several other telecom systems including billing, IN, CRM, DWH, mediation etc. This information is needed to make an informed decision about what caused the alarm. An interesting fact was observed here, for a specific type of alarm, usually a “fix investigation pattern” was followed and the decision was made in the end.

The term “investigation pattern” refers to a number of concrete and repeatable steps followed by an analyst. For example, in order to investigate an alarm, an analyst might pick the customer (ID) and then look for some parameters of the relevant profile per the CRM system. The analyst then may check a few parameters in the HLR, may verify some of the usage per the IN system etc before making a decision. Sometimes the analyst also requires input from a source/system owner or a systems expert before making the final decision. In our recent study on fraud management systems we observed that 70% of the technical alarms had fix investigation patterns based on the nature of the case. However, only 30% of the cases were observed where a certain level of fix investigation pattern was followed, but the final decision was only made after performing some other random steps. This study was based on the technical frauds types in the FM system and internal frauds were not covered.

A similar study was also carried out for revenue assurance systems, but the results were contrary to those found for fraud management. According to our study, only 25% of RA alarms exhibited the fix investigation pattern. For the other 75% of alarms, the investigation pattern was roughly split between 60% fixed and 40% as random and kept on changing. It is also worth mentioning that the performance of an analyst when using manual copy/paste work started to degrade after 5 hours of continuous work. One of the common errors was looking at the information of previous investigation and assuming it was current.

As a next step, a time consumption matrix was built. It was observed that two things were consuming most of the analyst’s time: (1) the investigation pattern; and (2) the case/alarm management process. The case/alarm management process usually involves several steps and transitions. In the alarms lifecycle a set of generic inputs are made and it is moved to the next transition, for example, “open”, “under investigation”, “working”, “under resolution”, “closed” etc. The case/alarm management is normally generic. For a specific type of alarm, a fixed pattern of options was selected most of the times. Another interesting revelation was that after the user had made comprehensive inputs during alarm management, these options are normally not used in reporting. A basic report on alarms is usually available in the system. However, by using a little mining, a very comprehensive trend report on companies’ grey areas can be built, based on these alarms.

The facts from this research were evaluated in order to increase the performance of the teams. Multiple options were analyzed and vendors of these systems were contacted and were asked to provide an option like “validate alarm”. The whole idea was to grab the information automatically (on click) which currently the analyst was extracting manually. The response from vendors was not very satisfactory and somehow most of them were not interested with the performance of RAFM teams. The vendors were of the opinion that such options are out of the scope of the system. Vendors also told us that the generic modules in their systems were not designed to support functionalities like this, and it might require a huge development in their system. Another interesting fact was that even an operator’s internal processes/ controls don’t allow easy communication between OSS/BSS systems backend and the COTS assurance tools, whilst providing specific information in the form of offline dumps was acceptable. After having multiple meetings with the vendors, we ended up having another opinion that vendors want analysts to work this way, even though it is inefficient. The analyst’s struggle indirectly suggests the benefits it delivers. At the end of the day, the operator’s management team may be pleased to pay-off heavy annual maintenance/CR fees, when they see their internal departments struggling and making efforts to reduce leakage.

In part two, Shahid describes the development of CRAWLER, a solution to the problems described here. Part two will be published next Monday.

From time to time, Commsrisk invites special guests to make an expert contribution.