Duplicate Data, Duplicate Costs

You cannot have a debate if you only hear one side of the story. Unfortunately, sometimes you only get to hear one side. A few weeks ago, I visited Gadi Solotorevsky’s blog. He had written a piece which, put simply, argued that revenue assurance vendors like his employer, cVidya, are better than vendors that offer alternatives that will save their customers’ time and money. He did not mention his employer, but that is not the point – his recommendation reflected their best interests. I posted a comment questioning whether this recommendation was motivated by getting the best results for the customer or by increasing the revenues for cVidya by doing unnecessary work. It has not been published. Given that Gadi has only ever received (or at least published) two comments in total, that both came from cVidya colleagues, and that the most recent one was posted 10 months ago, you would think he would be glad to be getting some response, even if it is negative. Not so. Last time I blogged about Gadi’s pro-cVidya slant on revenue assurance, Gadi responded, and I published. You can read Gadi’s comment here. So why is Gadi not returning the favour? I think I know the reasons why.

In Gadi’s post, he argued that revenue assurance data should only come from primary sources. That is a wonderfully purist attitude, if you have an unlimited budget to spend. But it is not good business practice. The theory goes that only data from a primary source can be trusted. All secondary sources, so the argument continues, are less likely to be reliable. Of course, this is not true. Secondary sources may be just as reliable as primary sources. Adopting a simplistic and dogmatic rule makes no sense in the real world of business. RA professionals often use secondary sources of data, and always have. Of course, secondary data may have been manipulated, processed, aggregated, filtered and mediated in all sorts of ways. But then again, the same thing will have happened to so-called ‘primary’ data. Manipulation of data is not intrinsically bad, and usually some degree of manipulation is a necessity for revenue assurance or any other task. The argument goes on that revenue assurance people cannot trust extracts of data if they did not design and control the original extract or the data repository. That is nonsense. Understand the data. If it meets the revenue assurance requirement, then use it. Save the company some money by avoiding the need to create duplicate extracts and duplicate repositories of duplicate data. If you do that, you avoid duplicate costs. There is no science that justifies making an a priori distinction between primary and secondary sources of data. To understand the prejudice, you have to look at the motives of the people who exhibit it.

There are several reasons why most revenue assurance projects avoid using the same data as available elsewhere. The first reason is cultural. It is the “not done here” mentality. With this mentality, if something is not done ‘here’, it cannot be trusted. Excuse me for pointing out something that vendors and RA professionals often do not want to hear, but just because somebody comes from an RA background does not make them better at extracting and maintaining data than somebody who does not specialize in RA. People who work in RA make mistakes too. By the same logic, if somebody is an expert in ETL, but not RA, it does not mean they will make a mess of the ETL for an RA project. Sadly, there are many people in RA who are not experts in ETL, or in other aspects of large-scale data management, and some of the cultural arguments about data integrity seem to play upon the ignorance of RA staff. A second extract of the same data is just as likely to be faulty as the first, if you do not understand the cause of inadequacies with the first extract. If there is a genuine concern that secondary data is unreliable then RA people should help the business understand why its data is unreliable, and, if it is not fit for purpose, help the business to fix it. Taking the attitude of “we don’t do that, we just do this” may help RA to keep a tight focus on priorities, but only at the risk of adding unnecessary additional costs to the business as a whole.

The second reason for duplication of data is control. RA people tend to fear that changes will be made to data, to software, and to hardware, without their knowledge. This will undermine, interrupt or invalidate their work. This is a genuine concern backed by real-life examples, but duplication is only a short-term solution. Duplication of data means duplication of maintenance means duplication of effort. At some point or other, an efficient business will question whether the extra costs are justified. Either those costs will be overtly cut, by removing the duplication at some later stage, or the business will continue to waste money on maintaining the duplication, or, worst of all, all budgets are squeezed but no clear decision is made. These budget constraints may lead to errors and problems with either stream of data. Those errors could have been avoided if cost synergies had been realized and effort was focused in just one place. It is often less risky, in the long-term, for businesses to maintain only have one version of the same data (with backup of course), and for this data to be used by the widest group of users in the business. The business can get better returns and better integrity by focusing its investment on checking and maintaining the integrity of that one enterprise-wide version of data, than on distributing its money and effort across multiple versions, where each different version is used by a different silo in the business. In addition, wider usage means more people, with varying knowledge and perspectives, are scrutinizing the data on a regular basis. This in turn increases the probability that any errors will be identified and remedied.

In the end, most RA people will undermine any pseudo-technical arguments about independence of data through their own actions. They do so by using the same data for two, three, four… any number of different tasks they may have, so long as they are all under the control of the same RA department. This may be comforting if you think RA has a special place in the business, but it only serves to turn RA into a silo. Silos in business often end up looking like dead ends, as various experienced RA practitioners have found. In contrast, using the same data as other business users helps to ensure RA is integrated into the rest of the business. Using the same data for multiple purposes also makes good and basic business sense. Reuse reduces the cost of software, reduces processing overheads, reduces the need for hardware, and reduces maintenance costs.

The truth is, vendors like cVidya will also use the same data feeds, the same data repositories etc, for lots of different purposes, when it suits them. So do their rivals. Doing so is not a weakness, it is an advantage. Smart vendors are open about the advantage of using data for multiple purposes to serve the business. In my recent interview of Benny Yehezkel, EVP of ECtel, he used the analogy of Airbus to explain ECtel’s strategy towards data. As Benny explained it, the ECtel strategy is to carry data across the long haul to major hubs, and then shuttle data onwards to the multiple endpoints. This makes good business sense. If cVidya, or any other vendor, had set up a new data repository to do fraud management, and you needed the very same data to also do revenue assurance, would you really insist on, and then pay for, the creation of separate, duplicate, data extracts and repositories? Of course not. In fact, if cVidya, or any other vendor, were pitching for a second project, they would be offering a cheaper price by proposing that the customer leverages the data obtained from the first project. Like elsewhere in business, a ruthless examination of commercial priorities will quickly brush aside any pseudo-scientific dogma.

Let me tell you a secret. When I saw that Gadi had started his own revenue assurance blog, I asked him to join the team of authors that eventually became talkRA. He refused. What did Gadi gives as his reason for blogging alongside other bloggers? Because he did not want to promote rival vendors. Think about that for a moment. We have a lot of authors here, representing a lot of different commercial interests, and yet Gadi is worried about promoting any rivals to cVidya. Where do you draw a line between promoting revenue assurance and promoting rival revenue assurance companies? You might as well ask where you draw a line between promoting revenue assurance promoting your own revenue assurance company. Time and again, I see Gadi pushing a point of view that obviously benefits his own company, and ignores alternatives proposed elsewhere in the RA community. Perhaps he is very lucky to have found an employer that perfectly represents his unbiased advice. But even so, you cannot be afraid of alternative points of view if you want a profession to be healthy. There is no doubt that cVidya is keen to influence the practice of revenue assurance. My concern with many of their recent activities is that they have started to cross a line between seeking influence and seeking to deny others the opportunity to exercise influence. As an accountant, it is illuminating to have observed the proper, and relatively open, debate between professionals about how accounting standards, ethics and practice should develop over time and be modified to tackle new challenges. The only reason to deny debate is when your goal is to enforce one point of view over all others. This is what I am worried has become the norm with cVidya, and people need to distinguish the advice that suits cVidya from the advice really intended to help the revenue assurance community.

There is a third reason for why RA projects include duplication of data, and hence of costs. It helps vendors to make more money. They get paid for doing things, so generally they will recommend doing more, not less. If that means duplicating data extracts and repositories, then the money they get paid is worth just the same in their bank account. So I am not surprised, but I am disappointed, to see Gadi recommending waste, when he writes on behalf of his employer and not on behalf of his customers. What is worrying is that Gadi has previously argued that his blog is not a promotional tool for cVidya. The implication is that Gadi is promoting the interests of all of RA, and not just those of his company. This would seem to fit with his role as the team leader for revenue assurance in the TM Forum. However, this leads to another worry about Gadi’s lack of impartiality. The TM Forum is dedicated to applying the concept of lean business. The central TM Forum model, the NGOSS, is built around this principle. To quote what the TM Forum says about NGOSS:

NGOSS is a TM Forum initiative to drive efficiency in and cost out of the operation of telecom networks. NGOSS enables service providers to change the way they think about their business and operations…

Through an integrated system of business and technical elements, NGOSS allows OSS/BSS systems to become interoperable like never before.

In other words, the very purpose of the TM Forum is to find ways to cut out inefficiencies like systems that do not operate together, silo processes, duplication of data… and the kind of poorly-integrated systems that Gadi is suggesting should be the basis of single-purpose revenue assurance. If RA is to survive in the long run, long after cVidya’s venture capital funding has run out, it needs to be based on efficiencies, not the wasteful exploitation of telcos to make a quick buck. That is the other side of the story. At least at talkRA, you are guaranteed to hear it.

Eric Priezkalns
Eric Priezkalns
Eric is the Editor of Commsrisk. Look here for more about the history of Commsrisk and the role played by Eric.

Eric is also the Chief Executive of the Risk & Assurance Group (RAG), a global association of professionals working in risk management and business assurance for communications providers.

Previously Eric was Director of Risk Management for Qatar Telecom and he has worked with Cable & Wireless, T‑Mobile, Sky, Worldcom and other telcos. He was lead author of Revenue Assurance: Expert Opinions for Communications Providers, published by CRC Press. He is a qualified chartered accountant, with degrees in information systems, and in mathematics and philosophy.

6 Comments on "Duplicate Data, Duplicate Costs"

  1. Hi Eric,

    Don’t be offended that Gadi has ignored your post on his blog – he has been ignoring my request to join the TM Forum Revenue Assurance Group for months now.

    On the subject of primary data sources and reconciliations, I have never seen an order management reconciliation using a primary data source – the physical contract. In fact there are numerous reconciliations where the primary data source cannot be used for either practicality reasons or the fact it’s just plain impossible.

  2. Avatar Mike Willett | 4 Oct 2008 at 11:20 am |

    Hi Eric,

    Let me just address the data issue.

    First you need to determine what the purpose of the data you are acquiring is for. If you are going to do a deep dive analysis to then I have a clear preference for data as close to “source” as possible. Nothing damages RA more than a “leakage” found in a data warehouse when it doesn’t actually exist in the operational system. Discounts are a classic example where the customer may have all sorts of discounts they are not entitled to and these sit on the customer profile but when the billing system calculates the discount to be applied, it has the correct logic and ignores the erroneous ones. My view then – sample data as close to source as possible for detailed data investigations.

    However, if you are acquiring data to build a dashboard and want to monitor trends then a “shared” data source is fine. A dashboard for RA doesn’t need to find the leakage, just alert that something may not be right and prompt a detailed investigation as above. So for continous trending data – summary data from a data warehouse or equivalent is sufficient.

  3. Hi Mike,

    You make a persuasive argument, but surely the risk of a phantom leakage due to flawed analysis of data in a data warehouse needs to be measured against the risk that a new extract of data, created specifically for revenue assurance, may also be flawed? My view is that this debate should ultimately be decided based on empirical observation. Like all debates about risk, the evaluation is not about risk alone but risk versus expenditure. In other words, is the money spent on implementing and maintaining an additional data extract for RA a better bet for avoiding errors than if the money was spent on modifications to the data loaded to the data warehouse? I have an open mind about this, having seen mistakes derived from data warehouses and seen RA teams implement flawed extracts. To establish a general rule would require long run observations based on practical experience – both positive and negative – of both approaches. Unfortunately, my suspicion is that in this area, like others, the information made public will tend to be skewed and is unlikely to be complete.

  4. Hi Gents,

    Though everyone makes a good argument for their preferred approaches, I also have an approach when it comes to primary and secondary data sources.

    Like Mike, I myself also believe in data which is as close to the source as possible. Now, the reason behind this is not to get “flawless data”, but merely because I would personally believe in performing a complete and comprehensive veracity check right through the entire value chain. There is value, at least in usage assurance, in starting from the actual files outputted from the network elements. For example, checking the actual file sequences, inter record time gaps, record sequences etc.

    Now, I’m not saying that the downstream systems are useless in RA without the corresponding “source” data. But here I would follow a slightly different approach. In the downstream systems, I would, post reconciliation with “primary” data source, be checking the effectiveness of business rules, looking for logical gaps, expected output vs. actual output etc.

    Dave also makes an interesting point, as far as order management systems go. Here (as in subscription related areas) we may not focus on a single “primary” data source. The main reason for this is because, the order flow may be to a single system or might be a simultaneous update across multiple platforms. In case of subscription assurance, we should be checking not for “primary” vs. “secondary” reconciliations, but ensuring data integrity across all related platforms. Unlike usage assurance (i.e. actual XDRs), here we have to atomically reconcile all line items right from perhaps the HLR to the Billing systems.

    Simply put, at each stage (and furthermore, in each functional area) we should be implementing checks which confirm the functional/logical processing aspects of that particular system. Each stage in the value chain has an important role (and I would say a unique role) in ensuring complete and effective RA.

    We cannot adopt a single rule of thumb to be implemented across all systems simply because it doesn’t make sense to do so. Without even getting into the nuances of content/service providers, even within the limited scope of simple usage and subscription assurance for voice services, we can see how we need to adopt different strategies.

  5. Avatar Mike Willett | 6 Oct 2008 at 11:08 am |

    Hi all,

    Good debate this.

    To Eric’s email first – I will make an assumption here that RA people want “data” as close to source as possible. I interpret Eric’s argument as that if a data warehouse can be proven to be the same (or very close) as source then RA should tap in here (correct me if I’m wrong Eric). However, data warehouses are rarely designed with RA in mind – fields RA may want to use are dropped, data is normalised, ETL is done to data prior to getting it into the data warehouse and data sets may not be complete etc. What RA can consider doing is acquiring the inbound feed either from the operational system or into the data warehouse and then it has the opportunity to filter the data as it sees appropriate.

    To Ashwin’s notes then I agree with the approach to usage reconciliation. Business rules will be applied to data in operational systems – one key job of RA is to ensure the intent of the business rules has been correctly coded and put into the production environment. This is good RA practice and, for me, better RA practice is then to ensure that those business rules make sense and revenue is not being thrown away through rules not carefully considered or that may have changed as the nature of the products offered change.

    In relation to the primary vs secondary issue. I think you need to consider defining a primary somewhere. If it is in order processing then the CRM system may need to be seen as the “gold standard” against which everything downstream involved in orchestrating the completion of the order through OSS and into the network layer is compared against. Why I say CRM is that this is because this is what the customer experience is about and presumably what you have told the customer they will get. You can then see what all the other systems say relative to this and you can base your correction activity against the “gold standard”. For example, if you sell an ADSL customer a 1Mb connection and OSS and the network says they have a 2Mb line, what corrective action will you take? I would be looking to fix OSS and the network.

  6. Hi Mike,

    I think we may actually agree – in a roundabout kind of way. To begin with, I do not disagree with anything you say about data being manipulated before loading to the data warehouse. There is hence the risk that if RA uses data from a data warehouse in a naive way, then it will make mistakes. The key difference between our positions focuses on our estimation of the risks involved. You highlight the risks involved in integrating the support of RA with support for other business users, but not the risks that come with keeping RA separate. You point out that data warehouses are rarely designed with RA in mind. The difference between our attitudes comes down to whether you think that failure is inevitable or not. I would not dispute that, historically, data warehouses were not designed with RA as a consideration. But that does not mean that data warehouses cannot and are not designed with RA in mind today, or that they will not increasingly be designed to support RA in future. We are now seeing telcos who consciously plan to realize RA through their corporate data warehouse. This makes perfect sense for the business: the functional elements are the same, whether the data is managed in a collective data warehouse for the whole company, or in a separate repository for RA. By the same token the ‘RA’ vendors are re-positioning themselves as ‘BI’ vendors. They are subtly shifting from single-purpose solutions to multiple-purpose solutions. A single warehouse would be the ultimate bedrock for multi-purpose solutions, so vendors are already moving in the direction of a data warehouse style of approach, although they are careful to move at a pace that suits them as vendors. The reason for separately managing RA data comes down to cost, scale and convenience. If it is costly, hard and inconvenient to include the data needed for RA in a corporate data warehouse, it may be cheaper, easier and more practical to recreate similar functional elements and support RA’s analysis of data using a stand-alone system. This was true in the past. However, we all know that in the world of computing, the equations involving cost and capability are not constant. Moore’s Law tells us we keep getting more and more processing power at lower cost. As each year goes by, there are fewer and fewer advantages gained from filtering and summarizing data in the data warehouse. Put simply, data warehouses keep getting better and better at handling large volumes of raw data. I wrote a while back about how even the process of getting data to the warehouse is being transformed, with the rise of ELT as a viable alternative to ETL. If data warehouses are being used to store large volumes of raw data, this begs a question of RA, which needs that raw data, is being inefficient, and more error-prone, if it collects and processes it using a separate and parallel architecture.

    Think of it this way. Suppose overnight you were given a new job: Director of Data Warehousing and Business Intelligence. You are responsible for the data warehouse, and for business budgets relating to managing data in general. Can you trust yourself to design the data warehouse with RA in mind? I hope you can ;) What then if you looked at the decisions you face, based on risk and Total Cost of Ownership. If you could support RA from the data warehouse, and it was the cheaper option, would you take it? Or would you follow Gadi Solotorevsky’s recommendation, and spend more money on keeping RA stand-alone, as a point of principle?

    You can look at the history of computing as a constant tension between centralization and distribution. Data and processing power may be closer to the user, or closer to some central machine. It can be all at one end, all at the other end, or there may be a mix. In each major shift, you move either towards centralization or decentralization. Mainframe computers gave way to the distribution of power in networked PCs. Now the tide is going the other way, towards centralization. Instead of running your Microsoft software on your local PC, Google and everyone else is driving a trend to do everything over the internet. Telcos are not ignorant of this transition, as investment decisions and profit margins will depend on how and when telco networks are used to enable computers to talk to one another. The question also applies within corporate environments, including telcos. Just as in the days when PCs first started to rival mainframes, decisions need to be made about how to distribute processing power and data across the corporate environment. One of the key debates for the evolution of RA will come down to whether it picks a “side” in this battle, or tries to remain neutral. The mainframe guys had their reasons to talk up the benefits of their approach, and to talk down the use PCs (concerns about security, control, consistency etc). Some of those arguments were not driven by the best interests of the business, but were influenced by the interests of the suppliers, and the interests of the people who felt their job was threatened or who saw new opportunities open up. The same is true of RA. There will be plenty of guys with a vested interest in solving RA using a relatively stand-alone approach. There will be other guys, disruptors in the market, who will propose that RA is delivered as a component of corporate-wide solutions. The vested interests will be most active in trying to influence the debate. However, if you are interested in RA as a discipline, and not because you are an investor or employee in one of the traditional RA vendors, then there is no reason to pick a side. It is better to stand back and judge each approach based on the results they deliver. Better still, do not look back in time, but look at the most recent results.

    From a pure risk perspective, I think most of the RA world is in danger of not just picking a side in a battle between centralization and distribution, but of picking the wrong side. I base this fear on an observation. Over the last year, I have increasingly noticed projects where a data warehouse mentality has been successful in delivering RA. I have no vested interest in the debate, as I do not make money either way (I mostly do the kind of job that involves talking and listening and thinking, not the type that involves implementing architectures to crunch data). Since I started to suggest that the data warehouse approach is gaining ground, I have seen a lot of defensive responses. It is not an accident that the strongest feelings come from those who make more money if RA is a separate solution. I wonder if it is just a coincidence that RA has emerged during an era when centralized corporate data management could be seen as more error-prone than solutions that are dedicated to a single department. In the early days, centralized automated processing was a clear step up from manual processing. In the future, better engineering and increasing processing power and storage capacity per dollar may mean centralized processing will once again be less error-prone than multiple unconnected solutions. However, cost and complexity in our current era has so far created an environment where parts of a business may be better served by unconnected solutions. RA is one of those departments, with particular and unusual needs and a low degree of cultural interaction and influence. RA has hence often been better served by going alone. But its job is not to implement technological solutions that keep certain kinds of people employed, any more than companies had a responsibility to keep mainframe guys in a job when PCs were better at getting the job done.

    RA has done better in the past by going alone, but stand-alone RA may not make good business sense in future. Arguments about mainframes being better for security and control were all ultimately overcome. The same could happen for any arguments from “first principles” that say the risks and costs favour keeping RA separate from the corporate-wide data processing architecture. I want to see empirical evidence that supports that argument, not some inappropriate analogy to journalism or anything else. In the last year, I think I am seeing more empirical evidence that RA can be successfully and cheaply delivered via a warehouse-centric approach. There have been recent successful projects that involved federating RA within corporate BI. If the federaters continue to get good results, at lower costs, then they will eventually prevail over RA separatists. Think of this as being the point in time where we saw the arrival of the thin end of a very thick wedge. The danger for RA is that, if RA is made to be synonymous with one side in this battle (e.g. “independent is always better!”) then, if that side loses, RA will lose too. RA needs to stand back from decisions about how best to process data – which will change over the years, as technology changes – and instead focus on its mission and purpose. RA professionals, if they want to earn the title of professional, must keep the ultimate business objective in mind, and remains neutral and open minded about the best ways to realize that objective. The best way to do this is to encourage debate and allow all comers a voice, not just those with a vested interest. At least on that point, hopefully we can all agree we are achieving that here!

Comments are closed.