Big Data Machine Learning: Telco Fraud Detection Points the Way

Invisible and instant: those are the two key characteristics of mobile fraud today. International crime rings are making millions of dollars by using highly sophisticated scams across multiple geographies, and disappearing before the operator knows the attack is happening. Most major attacks today are ‘fraud cocktails’: unpredictable mixtures of several fraud types, striking with unprecedented volume and velocity.

Operators responding to the Communications Fraud Control Association’s 2017 survey estimated that they lost USD29bn to fraud. More worryingly, many said they believed even more fraud was getting past their defenses but could not pinpoint what it was or how it happened.

The mobile industry is beginning to acknowledge a need for detection methods that are able to adapt to the fast pace of evolving network crime and usage patterns. It has become increasingly evident that traditional, rules-based systems with pre-set thresholds are no longer the answer. Mobile crimes morph far more rapidly than analysts can write rules. To write a rule, a fraud analyst needs to know about the fraud type. It can take days or weeks to analyze a previously unknown attack, during which time huge revenue drains continue to occur.

A combined big data and machine learning approach is proving one of the most promising of the new wave of solutions. Already in use in major service provider networks, the machine learning strategy is proving exponentially more effective in fighting fraud, delivering 350 percent better results than rules-based systems, and allowing analysts to shut down attacks in instants rather than hours or days. The key is to catch them early, and stop them fast.

It works by applying unsupervised machine learning at massive scale to huge lakes of operator and customer data, to determine the characteristics of normal traffic. Anomalies are identified instantly and the results presented as alerts and in visual representations to analysts, in real time. Fraud specialists can easily determine if an attack is happening. Not all anomalies are fraud, but ALL fraud is anomalous. It’s usually associated with high bursts of activity and/or long call times.

A cyber gang can set up, go to work, and disappear in 24 hours or less—before an operator knows the attack is happening. If thousands of users in Mexico start calling a number in Cuba or Latvia, it’s very likely that a Wangiri, or call-back premium rate scam is happening. Gangs also take advantage of the incentive plans that operators use to generate revenue from business travelers, inflating traffic, dramatically overusing the plans, or driving traffic to other international revenue share scams.

Lately, crime rings have started launching data-based attacks on services like WhatsApp. Even traditional Wangiri attacks are morphing. Instead of a simple missed call tempting the user to call back, at cost, criminals are using social engineering messages encouraging victims to call a premium rate number – for example: “Trying to deliver flowers to your wife. Call to confirm delivery time.”  Analysts need tools that help them detect emerging, morphing fraud. Machine learning does this.

Machine learning and visualizations are highly complementary. Humans often want to see visualizations that convey data for extra understanding and insight into the fraud technique or method. Anomaly detection visualizations show outliers but lose useful context. When you combine the two you have a very powerful detection and analysis approach.

This can be extended to any area where data analytics are used. Network fraud, subscription fraud, international revenue share fraud and revenue assurance are all natural first deployment models for big data machine learning applications, since the ROI is easily and immediately proven, providing a model for other usage scenarios. More and more fraud will occur on the data network in future, therefore gaining visibility into the characteristics of data usage will be paramount. Due to the vast amount of data flowing across telecoms networks, big data analytics capabilities are essential. Key to operators’ success in this area will be the ability to tap into an enterprise data lake.

Machine learning analytics can be applied across any industry or organization to detect issues ranging from mechanical breakdown to cyber threats. This will be vital as IoT devices proliferate. Beyond IOT, data-driven applications will change the face of most industries on the planet. Telecommunications companies are the foremost users of big data and their experience with machine intelligence and machine learning will forge a path for other industries attempting to harness information to protect and grow their businesses.

The original version of this article was published on the blog of Argyle Data. It has been reproduced with their permission.

Padraig Stapleton
Padraig Stapleton

Padraig Stapleton is VP of Engineering at Argyle Data. He was previously VP of Engineering and Operations for the Big Data group in AT&T. Argyle Data develops machine learning analytics tools for mobile operators, enabling detection, prediction and prevention of fraud, security and revenue assurance issues.

  • Dror

    Hello,
    Would like to know where did you get the 29bn$ figure? (as the 2017 CFCA survey was not yet published).
    Is it a formal number or just a ball park estimation you used?
    Regards,
    Dror

    • $29 billion is the correct figure per the CFCA 2017 survey. It seems the results of the CFCA 2017 survey have been shared through some channels but maybe not as widely as some previous surveys. For example, the figures have been presented at the CFCA’s own events, and at the RAG Johannesburg conference as noted here: http://commsrisk.com/10-defining-moments-at-rag-johannesburg/