Will Fraudsters Just Outwit Machine Learning Too?

The recently-published Q3 2018 DataVisor Fraud Index Report hints at an intriguing possibility that will terrify some suppliers of fraud management systems.

Existing systems, including some machine learning solutions, are reactive as they require prior labels and kick in after the fraud has been committed and damage done. Old attack labels offer limited value as they can only detect fraud based on features and attributes that are already defined and trained, yet new attacks continue to change and come in different ways.

Put simply, DataVisor argues that fraudsters will confound attempts to detect their crimes if machine learning relies upon historic knowledge of what constitutes a fraud. I suspect this concern is partly genuine, and partly bunk. A paradigm that only involves searching for patterns that were historically associated with fraud will obviously be less effective at ‘detecting’ new patterns that have never been previously associated with fraud. But no matter how you approach data, this is a general problem anyway. Some patterns will indicate fraud is taking place, whilst many other patterns will have nothing to do with fraud. Saying you should not be constrained by history is another way of begging the question as to how you spot a previously unknown fraud when you have no data that says you should be looking for it!

Though Datavisor has dressed the issue in modern clothes, the challenge of knowing what you are supposed to be looking for has a long history in computer science and data analysis. For example, Alan Turing and the other cryptographers who worked at Bletchley Park were tasked to decode German messages during World War 2, but obviously they could not program their primitive computers to search for patterns that would confirm they had determined the mathematical operations needed to decipher a message without defining sequences of letters the computer should find once a message had been decrypted. The key that unlocked that particular puzzle was that they could make good guesses about some of the contents of the messages they worked on, including the fact that many German messages ended with Heil Hitler.

The Datavisor report is obviously biased towards the techniques sold by their business, but it does offer some useful insights, including:

  • more sophisticated fraudsters may do a low-volume test to see if their attack will work, then wait over a month before unleashing as large an attack as they can;
  • to avoid repetitive patterns, fraudsters use internet proxies so it appears that a fake account has been accessed from a range of different locations, but the resulting geographical spread may be too great, with the apparent user seemingly jumping from country to country; and
  • whilst fraudsters can use scripts to create a different email address for each account they control, those addresses will each conform to an easily-discerned pattern.

You can obtain the Q3 2018 DataVisor Fraud Index Report by registering your details here.

Eric Priezkalns
Eric Priezkalns
Eric is the Editor of Commsrisk. Look here for more about the history of Commsrisk and the role played by Eric.

Eric is also the Chief Executive of the Risk & Assurance Group (RAG), a global association of professionals working in risk management and business assurance for communications providers.

Previously Eric was Director of Risk Management for Qatar Telecom and he has worked with Cable & Wireless, T‑Mobile, Sky, Worldcom and other telcos. He was lead author of Revenue Assurance: Expert Opinions for Communications Providers, published by CRC Press. He is a qualified chartered accountant, with degrees in information systems, and in mathematics and philosophy.