Zen and the Art of Root Cause Analysis

Our story begins with Farmer Joe who has a beautiful daughter Janet. Farmer Joe decides to bequeath the family decorative pin to Janet on her 20^th birthday. Now Janet, in a fit of unbridled joy decides to run around a haystack holding the pin towards the heavens. Suddenly,in a scene far too clichéd to be coincidence, she trips and the pin falls into the haystack…

Now begins the daunting task of “finding the pin in the haystack”. Janet is faced with a dilemma which would be quite familiar to RA analysts the world over – how do we find the pattern which highlights the root cause (or the pin, if you are a farmer’s daughter who goes by the name of Janet) within a world of millions of CDRs.

Of course, the solution is to cut the haystack into smaller cubes and search smaller segments for the pin. Does this sound familiar to you, my RA analyst friend? It should – because this is the way we attempt to find the root cause today. When your system presents you with millions of CDRs (or, God forbid, meaningless summaries), we tend to break them into smaller sets which have seemingly similar patterns. Then begins the back-breaking task of finding the elusive pattern that indicates the root cause – an endeavor that involves quite a few cups of strong coffee, pointless mails and shattered dreams regarding deadlines and analyst efficiencies.

But hey, this is how we do Root Cause Analysis the world over right? We would reduce our effort by managing the problem size right? Well, it gives me great pleasure to say that the winds of change are blowing. Today, I would like to introduce to you to a fundamental paradigm shift in Root Cause analysis which would effectively transform the way we do RA.

Let’s imagine for a second that Janet decides to find the pin by placing a powerful magnet over the haystack. Consider how much time and effort she saves, as compared to breaking the large haystack into smaller stacks. Consider how sure this solution is, as compared to the possibility of not finding the pin even after breaking the haystack into smaller segments.

Now imagine such a magnet for RA. A magnet that presents the analyst with all the hidden patterns in a problem set (discrepant CDRs). Imagine how this would change your day in terms of boosting analyst efficiency, achieving cost efficiencies as a department, being prepared to handle new and upcoming technologies and always staying one step ahead of the curve.

That magnet has a name, and its name is “Zen”. Subex recently launched ROC Revenue Assurance 5, and along-with RevenuePad (which my colleague Moinak would write about), Zen is one of the fundamental pillars of this ground-breaking solution.

Zen is an automated Root Cause Advisory engine which provides, for the first time ever, machine intelligence for pattern identification and presentment. What makes it revolutionary is that the engine is programmed to sniff out patterns with minimal involvement from the analyst. Give Zen two data sets, and it will tell you exactly why some CDRs in data set 1 are not present in data set 2. This also involves telling the analyst what percentage of the total data set can be linked to any particular pattern. Since pictures speak louder than words, here is a sample illustration:

Zen is essentially a data analytics engine for ROC Revenue Assurance 5. Based on the discrepant sets identified as the result of an audit, Zen automatically fires up the pattern analytics engine. As Zen works on identifying the patterns, it also works on linking the patterns to specific CDRs (to ensure that an audit trail would be maintained). Finally, Zen presents the analyst with a comprehensive view of:

All identified patterns in the discrepant data set
Distribution of how many CDRs are linked to which pattern
Historic event indicators to further guide the analyst towards the root cause

Zen is keyed towards two “Intelligent” actions:

Pattern Analytics
Analyst Feedback integration

We refer to Patterns as “Areas” and the learning from past investigations as “Reasons”. Why do we need both, you ask? The answer is fairly simple – the same pattern (or Area) might have presented itself for very different “Reasons”. A simple example of Subscriber profile between HLR and Billing might clarify this point.

An analyst, on performing the HLR vs Billing subscriber reconciliation, finds that 20 subscribers on the HLR are not present on the Billing platform. Now, in the absence of provisioning logs, he/she might surmise that this is a simple case of provisioning error and forward the information to the relevant teams.

However, if the same discrepancy is seen next week for the same set of subscribers, it might be prudent to address the possibility of internal fraud as well. Here we see an example where the same pattern (20 subscribers are missing repeatedly in billing but are provisioned on the network) might be due to two distinct “Reasons” – Provisioning Error or Internal Fraud.

Zen helps you tie it together. Reasons are incorporated into the Zen engine based on “Acknowledgments” received from various teams. This helps to ensure that “False Reasons” are minimized. In this manner, Zen becomes a repository of Analyst intelligence to address the world-over issue of Knowledge Management in RA.

Zen is a virtual analyst who never sleeps, eats or goes on vacations. For sure he will never leave the team (taking his accumulated knowledge with him).

In conclusion, I want all of us to take a moment to step into Janet’s shoes. The pin is in the haystack, and the stack is getting bigger and bigger all the time (due to burgeoning volumes and technology/product complexity). The timelines to find that pin are ever-shrinking, and cost reduction is the call of the hour globally.

How is your team planning on finding that pin?

“Bloody but unbowed” – Thank you kind sir. As somebody who thoroughly enjoyed “Gladiator”, I’m having dreams of grandeur right now.

Dan, you have raised very intelligent questions and I shall try to answer them. However just before I start, I wanted to clarify – “first time ever”. I still stand by this, since I have yet to see or hear from any operator globally in the RA space tell me (or mention in passing in any site) about how their RA system helps by directly presenting them the Root Cause. Subex didn’t invent machine intelligence (and if you ever hear me claiming this, please feel free to beat me on the head with a Sunfire X series server), but we are certainly the first to bring it to RA for the purpose of Root Cause detection. Of course, this is a very bombastic claim – but I am, as always, open to correction.

“How does Zen make use of meta-data? Or put in another way: how does it interact with the human RA experts who are teaching the machines what is relevant?”

This line tells me that you have in fact clearly understood exactly what I was attempting to explain in my blog, since you’ve hit the crux of the matter. Now I will attempt to answer your query (at a generic level, since the actual algorithms cannot be disclosed).

Zen works at two levels –

a) Operational data (eg. CDR)
b) Case Management (i.e. system internal meta-data for analyst interaction)

Operational data is for Zen to perform data pattern analysis on a minimum of two data sets (eg. MSC and Mediation output). It involves automated field level (including the values in the fields) data gathering, classification and presentment. This creates a list of potential root causes (lets call them PRC for simplicity).

Case Management data is for Zen to perform a recursive, weighted analysis of assigning a “Reason” to each PRC. This is performed by the system “looking and learning” from what an analyst has noted in the past. The case management functionality of the ROC RA solution provides a single consolidated workflow environment where all the information pertaining to a detected discrepancy is captured, eg.:

a) What data sources were involved
b) What is the discrepancy level
c) What is the threshold as per the defined KPI
d) CDR pertaining to this audit
e) Auto-assignment to analyst
f) Communication and escalation paths
g) etc.

Now, in each new “case”, the analyst has various fields to fill in based on their investigation. We call a set of such steps as a “Workstep”. At the end of each workstep, the case gets assigned to the next team in that particular workflow (eg. Network team, mediation team etc.).

As you can probably surmise, the system captures a significant amount of data (both CDR and Meta-data) in each Case docket.

This information is fed through the Zen platform on a daily basis for it to keep updating the associated “Reason” database. The platform also analyses the “Acknowledgement” trend in inter-workstep communications to ascertain what is the “Positive Hit-Rate” (or PHR) in investigations. Only those reasons with a high “PHR” are carried forward to the next stage (there is an algorithm which scores all the reasons and keeps updating reason scores periodically based on the PHR updates).

Now, the drawbacks to this approach are:

a) The system is not accurate from day 1. Building the Reason and PRC database will take time. The PRC can be generated from Day 1, since it is an engine which is based on pattern analysis only. But associating it with Reasons will take time as it is directly dependent on analyst inputs which need to be built over a period of time.

b) If the operator does not believe in a strong RA process and does not use the case management functionality, Zen will only be able to deliver Potential Root Causes (PRC) without any level of associated Reasons. This is a situation where only half the platform is being used, and is not the recommended approach.

Luckily for us, in the case of ROC RA, we aren’t necessarily in the same boat as a Google search. Throughout the application, we religiously link all the components of an audit (in a tightly coupled/bi-directional linked manner), right from the raw files all the way to dashboards and mail communication regarding the audit. As a result, the meta-data management is quite structured and the curse of free-text, unstructured information is mitigated to a large extent.

Dan – I believe I’ve ranted for long enough, but does this help you gain a deeper understanding of the Zen platform?

20 COMMENTS

Rajun 23 Feb 2012 At 2:04 pm
Zen?it’s a simple dashboard
I have better names
Tad-yet another dashboard
Yar-yet another report
Subex doesn’t have technology it’s just a database and dashboard
U can build your own
Dumitru 23 Feb 2012 At 4:22 pm
A .bmp attachment of almost 3MB? Really?
On-topic:
If you’re talking about “patterns” that are implemented with “minimal involvement from the analyst” then who exactly decides those patterns? Usually leakage scenarios are pretty unique to each operators, apart from extremely obvious cases.
Since you mention”learning from past investigations”, then:
What is the learning period necessary for Zen to be effective?
What’s the expected % of errors this analysis would give? Basically how is the quality of the analysis measured?
Ashwin Menon 23 Feb 2012 At 4:26 pm
Hi Rajun,
I’m not sure how you surmised it is a dashboard. Just for clarity, it is an analytic engine which:
a) Analyzes two sets of data
b) Identifies underlying common patterns as to why some CDRs are present in set 1, but not set 2
c) Highlights the distribution of patterns to the analyst
For example, let us consider records being audited between MSC and Retail Billing. When the audit shows that 32,000 CDRs are present in MSC but not in Billing, Zen would do the following:
a) Goes through the complete set of 32,000 CDRs and tries to identify a common pattern, for eg. 92% of the missing CDR are calls to a particular country, 6% of the records are call forwarding, 2% are of a particular event code.
b) Presents the patterns in the previous point to the analyst, with one more treatment. Zen checks to see if any of the patterns identified have registered any cases earlier. If yes, it will associate an acknowledged case to the identified pattern so that the analyst can look at historic information regarding how the case was closed.
The difference here, which I would classify as specialist “technology”, is that while earlier as RA analysts we would be presented with a large volume of discrepant CDRs, now with Zen the analyst is also presented with a pattern analysis which enables quick Root Cause identification.
Only the presentment has been shown in the above blog, and yes we do use a dashboard to show the findings from Zen.
Does this help you understand the Zen engine better? If not, please feel free to reach out to me and I can help you with further information.
Ashwin Menon 23 Feb 2012 At 4:57 pm
Hi Dumitru,
Sorry about the huge attachment. We had requests from other quarters to provide a high-resolution screenshot of the output of Zen, and therefore we decided to put it up first at TalkRA.
As for your queries, I have tried to summarize them for the benefit of all the readers.
a) Who implements the patterns? – I’m glad you asked this question. The patterns are not implemented by anybody. There are no preset templates for patterns because as you rightly mentioned, patterns which point to the root cause vary tremendously from operator to operator. What Zen does is “discover” the patterns. In my response to Rajun, I have provided a simple example of how this discovery process occurs. Essentially this is a Self-learning component which doesn’t require pre-configured patterns. Zen dives through the discrepant data set and tries to find commonality in the field values of all the CDRs. It then ties these commonalities into a consistent pattern and presents the set to the analyst. If you download the bmp you can see the way these results are presented in the dashboard chart entitled “Top 10 Areas”.
b) What is the learning period necessary for Zen to be effective? – Here, we need to understand that Zen has two components. The first is the data analysis engine which is effective from day 1 as it doesn’t work on historic information but real-time analysis of patterns in discrepant records. The second component is the “Reason”, which is based on analyst feedback from cases generated due to a particular pattern. Typically, the second part would be effective only after at least a week. From that point on, with higher number of cases being closed the accuracy of the Reasoning engine increases.
c) What’s the expected % of errors this analysis would give? Basically how is the quality of the analysis measured? – The engine is designed to address 100% of errors (by errors I assume you refer to discrepancies identified in audits). Zen doesn’t follow a sampling methodology and would dig through the complete data-set. Keep in mind, if there is an underlying pattern to the discrepancy set, Zen will identify it. The quality of the analysis is measured based on “Acknowledgments” which is a direct input to the Reason engine. The Reason engine ties Patterns to Acknowledged hits so that analysts do not waste time going after potentially dead-end patterns, or patterns which have been identified as “False Leads” earlier.
Does this help you Dumitru?
Hakan 23 Feb 2012 At 8:11 pm
Its unfortunate that such a site has become an obvious sales tool for software vendors.
Regarding the technology, there is no doubt that profiling tools have a role in Revenue Assurance and Fraud, but this technology has been around for many years – so whats new?
I think its important to manage expectations in relation to these claims. The ability to perform root cause analaysis using two sets of data is a limited way of detecting issues, which often require the interrogation of external data in a heirarchical, stepped logic in order to eliminate likely causes of errors. System standing data issues are often the underlying cause of errors in current times and using the two source data reconciliation sets in a profiling analysis willl generally not detect the underlying issue. It is often necessary to intorduce a third or fourth data set in order to eliminate false negatives and then drill down to the core issues.
Any learning capabilities when using limited thinking is a waste of time.
Please give us something new and useful.
Dumitru 23 Feb 2012 At 9:21 pm
I wondered if the word “error” may be interpreted wrongly. My bad.
What I was asking was what is the target success rate of this analytic engine.
If I understood correctly, the value of such a tool is that it’s supposed to do an automatic root cause analysis and point to the source of the problem, rather than just showing discrepancies in data. It’s also supposed to get better over time. Typical AI engine.
Now, based on: 1) quality of the data, 2) quality of the human input and 3) quality of Subex algorithm, this tool will be spot-on on some cases, and possibly wrong on other cases.
My question was if there’s a proposed success rate that this tool will provide. What’s the expected level of false positives and what’s the target for the tool?
At my first question you’re saying that the system will start generating results after a week, however in my experience it takes a bit longer than that for a system to start producing usable results. Understandably this is a new product that has not been extensively used so that’s why I was asking if there’s any commitment on analysis quality. So if after a week the false leads are at 80%, what’s the expected behavior after 6 months? 60%? 10%?
Of course the system will get better if exactly the same situation occurs over and over, but unfortunately this is rarely the case with RA. So this is where I’d expect the quality of the algorithm to come in, and identify a new scenario as a confirmed pattern even if it’s not 100% identical to past cases.
Ashwin Menon 24 Feb 2012 At 4:58 am
Hi Dumitru,
“If I understood correctly, the value of such a tool is that it’s supposed to do an automatic root cause analysis and point to the source of the problem, rather than just showing discrepancies in data. It’s also supposed to get better over time. Typical AI engine.”
You are absolutely right.
“Now, based on: 1) quality of the data, 2) quality of the human input and 3) quality of Subex algorithm, this tool will be spot-on on some cases, and possibly wrong on other cases.”
Again, full points to you.
“At my first question you’re saying that the system will start generating results after a week, however in my experience it takes a bit longer than that for a system to start producing usable results.”
Yes and no. In my response, I was mentioning that it would start presenting pattern results immediately, but the reasoning engine would only start providing links to previous cases for false positive identification after at least a little over a week. However, after just a week, we do not expect the results to be perfect. The longer the tool is used, the better and more accurate the “Reason” validation would become.
“I was asking if there’s any commitment on analysis quality.”
Yes Dumitru, we do have a number which is a combination of approx. 2 years worth of trials in our labs as well as some very co-operative telecom operators. However, we would not want to release this commitment to the market without proving the engine in multiple production environments. This is done as we do not want to release a target success vector which might be misleading. However, I can tell you that the time-frame is far lesser than 6 months for close to 80 odd % accuracy as of now in an environment where data processing is daily.
“So this is where I’d expect the quality of the algorithm to come in, and identify a new scenario as a confirmed pattern even if it’s not 100% identical to past cases.”
I completely agree with you and this is why we are excited about Zen. We had presented an approach to automated Root Cause Analysis to the general public during the 10th RA and Billing conference which was held in Thailand in 2010. At that time we gave away our approach to the operator audience in the hope that we could collaboratively come up with a potential solution. Back then, the approach we were thinking about was extremely effective for repeat cases, but failed for new cases which have at least a 30-40% match with historic cases.
The new approach uses some really fancy mathematics to effectively link potentially related areas. The results, both in our labs and in our operator environment, have proven conclusive. In addition to repeat cases, new cases with overlaps have been identified successfully. Keep in mind the word “overlaps” – here I refer to some basic level of commonality at a minimum, eg. same network element, same price plan, same number range, repetitive or recursive patterns etc.
I had written about some of the components in 2010 (http://talkra.com/archives/1286), but most of it has undergone significant rework. But I still feel reading the old post might give you some insight into our thought process back then.
I hope I was able to answer your queries effectively. But please do feel free to ask more if you need further clarity.
Ashwin Menon 24 Feb 2012 At 5:30 am
Hi Hakan,
I’m sorry you feel that way. Keeping aside the fact that I work for a vendor, I was also one of the first writers on this site. Our goals for this site from it’s inception till date is transparency and, in a way, democracy. Proof of that is the fact that your comment is approved for posting and Eric (as he controls the site) might be able to prove to you that I approved your comment for posting.
Let me start off by saying that we wanted to post this product update since the feature we are unveiling is something which has not existed. Period. I am personally quite passionate about this area since it is a culmination of nearly 4 years of thought, sweat and frustration. Do you really want to dismiss Zen simply because a vendor wrote about it? Is it inconceivable that some of us actually want to help progress this discipline?
As I’ve mentioned in my response to Dumitru, Subex had given us the go-ahead to share our concept for automated Root Cause analysis freely with an open forum. We did so in a RA conference in Thailand back in 2010 in the hope of a collaborative solution. 2 years down the line we have come up with something which we believe will change the face of root cause analysis. Allow me to help you understand Zen.
“Regarding the technology, there is no doubt that profiling tools have a role in Revenue Assurance and Fraud, but this technology has been around for many years – so whats new?”
You are right, profiling tools have been around for many years. Zen is not a profiling tool.
What Zen does, in simple terms, is what an analyst does when he/she is investigating a discrepancy – Looking at the discrepant data to identify hidden commonalities or patterns which are not obvious and which no amount of profiling can identify. If this task was easy, then RA departments globally would not require expert RA analysts in their team. Profiling and Pattern identification are two very separate concepts.
“often require the interrogation of external data in a heirarchical, stepped logic in order to eliminate likely causes of errors. ”
I agree. This is why Zen is split into two parts:
a) Pattern Identification – It only identifies potential root causes, eg. all discrepant records are MOC CDRs calling Turkey.
b) Reason Engine – Once the case has been generated, this engine tracks all the worksteps undertaken by a RA analyst to interrogate other systems. Based on acknowledgments, Zen “learns” whether the Root Cause identified was right or wrong.
” It is often necessary to intorduce a third or fourth data set in order to eliminate false negatives and then drill down to the core issues.”
Agree. Again, false positive reduction is the goal of the Reason engine. Also, I believe that I might have misled you into believing that Zen works on only 1 or 2 data sets. My examples in the blog and the query responses are only for simplicity. We are not limiting the sources of information to only two data sets in any way whatsoever.
Hakan, you have used the term “profiling” multiple times in explaining why you do not agree with Zen. I think this is the basis of the confusion. I repeat for the sake of absolute clarity – Zen is not a data profiling feature.
Profiling is a basic, fundamental capability and we (Subex) would not be very excited about it – for sure we would not have done an international product launch with blog updates for profiling :)
I’m sure that visualization of the Zen solution might be a bit difficult, but please feel free to reach out to me and I’ll try to help you with a demo walk-through to show the differences between a profiling engine and Zen.
Eric 24 Feb 2012 At 11:58 am
@ Ashwin,
You are more generous than I am. I looked at Hakan’s insulting comment last night, and decided not to approve it, although I didn’t delete it either. You kindly went ahead and approved it and even took time to write a response that was more polite than Hakan’s comment deserved. For that, I applaud you.
@ Hakan,
“Its unfortunate that such a site has become an obvious sales tool for software vendors.”
I thought that Rob Mattison and Gadi Solotorevsky were by far the two biggest clowns in the industry, but now I see they have new competition. I encouraged Ashwin to write this post because I wanted him to explain some of the new functionality described for Subex’s ROC version 5. What would you prefer? That we don’t mention it?
Over the years talkRA has been accused of anti-vendor or pro-vendor bias by employees of Subex, Connectiva, cVidya, and WeDo, never mind some of the abusive comments we get from anonymous jackasses. We’ve had lies told about us by Papa Rob and his cronies. We’ve been dismissed as irrelevant or a failure on countless occasions. I even got booted out of BT for blogging the truth about the ‘World RA Forum’ scam they tried to run with cVidya. But our readership keeps growing and growing. Why? Because we tell the truth you can’t find anywhere else. We were first to slam GRAPA as a snake oil scheme designed to exploit naive RA analysts, crooked RA managers and gullible HR departments in telcos – and we did this when most of the industry was rushing to join GRAPA or trying to do business with it. We highlight the inconsistencies in what vendors say about their business performance, and we give as much coverage to their failings as their successes. We’re keeping people informed of how Malawi’s regulator is trying to justify a needless threat to customer privacy under the pretext of RA. We didn’t shy away from the big story that Razorsight stole intellectual property from TEOCO. We’ve explained how regulators lie to customers over bill accuracy. We’ve named the telco hypocrites that boast about their integrity whilst overbilling customers. We point out the manipulative control that cVidya now exerts over the TM Forum, even though most prefer to repeat their lies. And now we’re seeing an open and public examination of the functionality in Subex’s new software, where the vendor’s explanation is being challenged by members of the public. I want that challenge to be constructive. And I believe we should continue to do more of the same, because letting vendors explain what they offer, and then publicly respond to questions about their products, is the way forward for openness and transparency in our industry. Isn’t it about time that there was somewhere the vendors could explain exactly what their software does, and where ordinary people can push back and make the vendors explain and justify themselves further? I think so. And if we do that, we should thank the vendors with the courage to take part in that process. The vendors with something to hide will simply refuse to participate in the kind of 2-way communication we’re trying to create here.
Am I proud of what we’re doing? Heck yes. Should we stop? Never. And the size of our audience reveals how many decent folk agree with what we’re doing. This month we’ve already achieved the second-highest readership figures in our long history, and there’s still a few days left. We’re on the right track. Get on board with us, or buy a one-way ticket for the fantasy ride offered by the cheats and fools. I passionately believe that what we do here is good for our industry. I say you should cut the lazy cynicism and show active support for what we’re doing by engaging and contributing to our program. We give a lot and we demand little in return. It’s time to stop asking what we can do for you, and to start asking what you can do for us and for the good of our industry.
Dumitru 24 Feb 2012 At 12:10 pm
Well Ashwin, as I am sure you know, the devil is in the details.
Your answer makes me want to ask 20 more questions, but I’ll just say that in my opinion Subex will have to share a lot more on how the algorithm works. Some clear cases would be good too, since you say it’s been used in some operators.
Joseph Nderitu 26 Feb 2012 At 1:34 pm
Ashwin and All-the-debaters-who-kept-it-honest-and-professional,
One of the reasons I am on talkRA is the fact that here, vendors, operators, industry bodies, commentators and pretty much everybody can give their honest opinion. I would request that we refresh ourselves on the mission of talkRA and particularly one sentence that I think makes talkRA special (in my view anyway):
…[talkRA]provides a platform for thought leaders, allowing them to communicate and exchange ideas without imposing any limitations based on employment or affiliations…
Ashwin (proudly works for Subex) and discloses as much even in his blogger profile. From what I see in his post, he freely shared information regarding an accomplishment that Subex has recently had. It would be of benefit to the wider RA community if we also respectfully critiqued what he has shared. Who knows, maybe Subex, or some other operator will also gain some more light-bulb moments that will make the practice of RA, fraud management, revenue management and business intelligence in telecoms become even better. It was refreshing to see how Ashwin and Dumitru have engaged on this piece.
Ashwin Menon 27 Feb 2012 At 5:36 am
Hi Dumitru,
I can understand the 20 questions perspective. Tell you what, I shall try and clear some of our test cases to publish on TalkRA. As our customers were essentially allowing us to run a test-bed on their data, there are some legal (and trust) obligations which we have to maintain.
As soon as I can get it cleared, I’ll try to put up some data set results which would help you gain a deeper understanding. I would request some patience as this involves cross-time-zone communication and clearance from various stakeholders in the operator organization.
I regret to inform you that I cannot share the specifics of our algorithms, but I can and will help you dissect its functionality and applied benefits. So, short of giving away the algorithm, I will be more than happy to explain in detail how the root causes are identified and how the reasoning engine is used for feedback integration.
rajun 28 Feb 2012 At 7:46 pm
eric
u do hate some vendors and hate less others.
i can say for u sure that u dont like any of the vendors.
did u tried ever to work for one of them?
Max Rostron 22 Mar 2012 At 3:23 pm
Rajun,
Having known Eric for some 11-12 year now and having worked with him during which time I got a full appreciation of his standards, in my opinion you are challenging one of the most highly respected individuals within this domain. I have worked for several vendors that have had both positive and negative comments written about them on this forum, and the one thing that I can confirm is that Eric has to this day, and i’m sure moving forward, has only ever strived to achieve a couple of things through his forum –
Be dedicated to promoting the development of the practice of revenue assurance, fraud management, revenue management and business intelligence in telecommunications service providers and other industries…
And
Provide a platform for thought leaders, allowing them to communicate and exchange ideas without imposing any limitations based on employment or affiliations.
With this in mind I believe that it might be a consideration for you to spend more time reading the content rather than challenging it.
Dan Baker 24 Apr 2012 At 11:57 pm
Max, thank you for affirming the integrity of this fine forum. I would only add that we should not be surprised when readers are highly critical of submitted articles, especially those from people who have something to sell us.
Though Hakan was perhaps a bit harsh, talkRA is no stranger to highly opinionated columns on RA, real-life Big Foots, and all subjects in between. Sir Eric has trained us well: bark often and loudly to scare away imposters.
Any vendor who tries to pass a fast one over talkRA readers has another thing coming. And to his credit, Ashwin is bloody but unbowed.
I found it interesting that this post was closely followed by an interview with Lionel Griache on his open source RA initiative. Both stories are about the value of RA software, and I think both parties have more educating to do to please the many skeptics out there – I being one of them.
First off – Lionel, we congratulate you for your achievement. Now you face the daunting challenge of building a user community, first class documentation, consulting support, training, and marketing around your new tool. Those challenges certainly put the advantages in the court of existing RA software vendors who have many years’ experience addressing these concerns.
Ashwin, thank you for your fine introduction to Zen and for fielding some tough questions here. Unfortunately, you’re not out of the woods yet, my good colleague :- )
What raised my eyebrows a bit was the sentence: “Zen is an automated Root Cause Advisory engine which provides, for the first time ever, machine intelligence for pattern identification and presentment.” [my emphasis added]
Data warehousing and machine intelligence has been around an awfully long time. Back in 1994, I authored a research report on Telecom Data Warehousing, so the claim to have a “first time ever” technology sounds like a stretch. It may very well be true, but you haven’t yet made the case to justify that broad claim – though I forgive your exuberance because I know you’re proud of the work you and your fellow Subexians put into Zen.
As luck would have it, a week before I ran across your column today, I found a relevant blog article by Jeffrey Phillips, an expert who has written a few books on the subject of business innovation. His article is appropriately named, “Finding Needles in Haystacks”.
Phillips makes the case that meta-data management is the secret sauce for the next breakthrough in analytics. He argues that the average business must catalog information at the rate of 5 Megabytes a day. For a 50-person firm, that translates to more about 1 Gigabyte of information a month. Not only is the rate of information to be searched growing fast, but we lack a consistent way of classifying that information. So we quickly run into the common situation where a powerful a search engine like Google often can’t even get close to the specific information we need to know.
So my question to you, Ashwin, is: how does Zen make use of meta-data? Or put in another way: how does it interact with the human RA experts who are teaching the machines what is relevant?
http://workingsmarter.typepad.com/my_weblog/information_management/
Dan Baker 24 Apr 2012 At 11:57 pm
Max, thank you for affirming the integrity of this fine forum. I would only add that we should not be surprised when readers are highly critical of submitted articles, especially those from people who have something to sell us.
Though Hakan was perhaps a bit harsh, talkRA is no stranger to highly opinionated columns on RA, real-life Big Foots, and all subjects in between. Sir Eric has trained us well: bark often and loudly to scare away imposters.
Any vendor who tries to pass a fast one over talkRA readers has another thing coming. And to his credit, Ashwin is bloody but unbowed.
I found it interesting that this post was closely followed by an interview with Lionel Griache on his open source RA initiative. Both stories are about the value of RA software, and I think both parties have more educating to do to please the many skeptics out there – I being one of them.
First off – Lionel, we congratulate you for your achievement. Now you face the daunting challenge of building a user community, first class documentation, consulting support, training, and marketing around your new tool. Those challenges certainly put the advantages in the court of existing RA software vendors who have many years’ experience addressing these concerns.
Ashwin, thank you for your fine introduction to Zen and for fielding some tough questions here. Unfortunately, you’re not out of the woods yet, my good colleague :- )
What raised my eyebrows a bit was the sentence: “Zen is an automated Root Cause Advisory engine which provides, <b>for the first time ever</b>, machine intelligence for pattern identification and presentment.” [my emphasis added]
Data warehousing and machine intelligence has been around an awfully long time. Back in 1994, I authored a research report on Telecom Data Warehousing, so the claim to have a “first time ever” technology sounds like a stretch. It may very well be true, but you haven’t yet made the case to justify that broad claim – though I forgive your exuberance because I know you’re proud of the work you and your fellow Subexians put into Zen.
As luck would have it, a week before I ran across your column today, I found a relevant blog article by Jeffrey Phillips, an expert who has written a few books on the subject of business innovation. His article is appropriately named, “<a href=http://workingsmarter.typepad.com/my_weblog/information_management/">Finding Needles in Haystacks</a>”.
Phillips makes the case that meta-data management is the secret sauce for the next breakthrough in analytics. He argues that the average business must catalog information at the rate of 5 Megabytes a day. For a 50-person firm, that translates to more about 1 Gigabyte of information a month. Not only is the rate of information to be searched growing fast, but we lack a consistent way of classifying that information. So we quickly run into the common situation where a powerful a search engine like Google often can’t even get close to the specific information we need to know.
So my question to you, Ashwin, is: how does Zen make use of meta-data? Or put in another way: how does it interact with the human RA experts who are teaching the machines what is relevant?
http://workingsmarter.typepad.com/my_weblog/information_management/
Ashwin Menon 30 Apr 2012 At 5:48 am
“Bloody but unbowed” – Thank you kind sir. As somebody who thoroughly enjoyed “Gladiator”, I’m having dreams of grandeur right now.
Dan, you have raised very intelligent questions and I shall try to answer them. However just before I start, I wanted to clarify – “first time ever”. I still stand by this, since I have yet to see or hear from any operator globally in the RA space tell me (or mention in passing in any site) about how their RA system helps by directly presenting them the Root Cause. Subex didn’t invent machine intelligence (and if you ever hear me claiming this, please feel free to beat me on the head with a Sunfire X series server), but we are certainly the first to bring it to RA for the purpose of Root Cause detection. Of course, this is a very bombastic claim – but I am, as always, open to correction.
“How does Zen make use of meta-data? Or put in another way: how does it interact with the human RA experts who are teaching the machines what is relevant?”
This line tells me that you have in fact clearly understood exactly what I was attempting to explain in my blog, since you’ve hit the crux of the matter. Now I will attempt to answer your query (at a generic level, since the actual algorithms cannot be disclosed).
Zen works at two levels –
a) Operational data (eg. CDR)
b) Case Management (i.e. system internal meta-data for analyst interaction)
Operational data is for Zen to perform data pattern analysis on a minimum of two data sets (eg. MSC and Mediation output). It involves automated field level (including the values in the fields) data gathering, classification and presentment. This creates a list of potential root causes (lets call them PRC for simplicity).
Case Management data is for Zen to perform a recursive, weighted analysis of assigning a “Reason” to each PRC. This is performed by the system “looking and learning” from what an analyst has noted in the past. The case management functionality of the ROC RA solution provides a single consolidated workflow environment where all the information pertaining to a detected discrepancy is captured, eg.:
a) What data sources were involved
b) What is the discrepancy level
c) What is the threshold as per the defined KPI
d) CDR pertaining to this audit
e) Auto-assignment to analyst
f) Communication and escalation paths
g) etc.
Now, in each new “case”, the analyst has various fields to fill in based on their investigation. We call a set of such steps as a “Workstep”. At the end of each workstep, the case gets assigned to the next team in that particular workflow (eg. Network team, mediation team etc.).
As you can probably surmise, the system captures a significant amount of data (both CDR and Meta-data) in each Case docket.
This information is fed through the Zen platform on a daily basis for it to keep updating the associated “Reason” database. The platform also analyses the “Acknowledgement” trend in inter-workstep communications to ascertain what is the “Positive Hit-Rate” (or PHR) in investigations. Only those reasons with a high “PHR” are carried forward to the next stage (there is an algorithm which scores all the reasons and keeps updating reason scores periodically based on the PHR updates).
Now, the drawbacks to this approach are:
a) The system is not accurate from day 1. Building the Reason and PRC database will take time. The PRC can be generated from Day 1, since it is an engine which is based on pattern analysis only. But associating it with Reasons will take time as it is directly dependent on analyst inputs which need to be built over a period of time.
b) If the operator does not believe in a strong RA process and does not use the case management functionality, Zen will only be able to deliver Potential Root Causes (PRC) without any level of associated Reasons. This is a situation where only half the platform is being used, and is not the recommended approach.
Luckily for us, in the case of ROC RA, we aren’t necessarily in the same boat as a Google search. Throughout the application, we religiously link all the components of an audit (in a tightly coupled/bi-directional linked manner), right from the raw files all the way to dashboards and mail communication regarding the audit. As a result, the meta-data management is quite structured and the curse of free-text, unstructured information is mitigated to a large extent.
Dan – I believe I’ve ranted for long enough, but does this help you gain a deeper understanding of the Zen platform?
Dan Baker 30 Apr 2012 At 9:59 pm
Thanks, Ashwin. You’ve given us some good detail to comment on. You may now remove your helmet, dust off your uniform, and join the rest of us drinking wine at the public bath :- )
The focus on Root Causes is certainly something I have not heard other software vendors talking about – though “root cause” analysis is commonplace in the service assurance and cybersecurity camps.
I am interested to hear comments and follow-on questions from the experts reading this forum.
Lee Scargall 1 May 2012 At 11:07 am
Your system sounds like it uses plain old neural networks to me, which have been around since the days of Alan Turing, and became popular in the late eighties for pattern recognition in image and video detection. I know neural networks have been used extensively in fraud detection for some time now but not so much in RA. I’d be keen to hear if any readers have found any benefits of Zen, over and above the existing software supplied by Subex.
Ashwin Menon 18 Oct 2012 At 5:59 am
“So long, and thanks for all the fish”
I’ve been hitch-hiking through the known and unknown quadrants of the RA universe, and as is evident from my lack of articles in the recent past, I now resemble Tom Hanks from “Cast Away”. But now that I’ve found a way to re-integrate into society, I come bearing gifts!!!
We have been trying to get customers to reveal to the world the benefits they have seen from the Zen platform (proof of the pudding as it were). We have finally gone live in a couple of operators who have agreed to do so.
Zen is being run in both these customers as a analytic layer prior to final discrepancy presentment. The operators measured various metrics pertaining to investigation timelines, case closure rates,case efficiency analysis etc. and have provided us with the net savings report in terms of analyst productivity enhancement.
Subex would be launching the same to the world at large as well, but we would like to provide the TalkRA readers with a first look at the benefits from a telecom operator environment.
Interested readers can find the infographic at this link – http://bit.ly/Rh8eAG

Comments are closed.

Zen and the Art of Root Cause Analysis

Related Articles

AI Needs RA More than RA Needs AI

Disappointing Results Confirm Subex’s Decline

Former Subex Executive Runs for Governor of Colorado

Dull KPMG Report Reveals More about the Decline of Revenue Assurance than Its Future

20 COMMENTS

The Commsrisk Global Fraud Dashboard

Get Our Weekly Newsletter by Email