I loved this short article by Aditya Rana of Dataconomy, which talks about 7 mistakes that people make when analysing data. The piece focuses on the financial sector, but the flaws in human reasoning are general in nature, and no amount of data can compensate for the mistakes made by a person who applies the wrong thought process to that data. Financial institutions obviously use lots of data to manage their risk, and telcos should do the same. If bankers still make really bad decisions about the risks they take, then so might we. Here is the list of common errors that Rana identified, plus some links to other articles if you want to learn more.
- Confirmation bias. Check out some articles about confirmation bias from Science Daily.
- Selection bias. There is a good entry about selection bias at RationalWiki.
- Failure to identify outliers. Irad Ben-Gal of Tel Aviv University wrote a good paper on outlier detection which can be downloaded from here.
- Simpson’s Paradox. The way data is grouped can lead to the opposite conclusion from that which should be drawn. This page at VUDlab beautifully illustrates how this happens, and gives real-life examples.
- Confounding variables. Failing to appreciate the significance of a variable which has not been considered may lead people to jump to unwarranted conclusions about the correlations they identify, as succinctly explained in this post on Explorable.
- Non-normality. Assuming a normal distribution can lead to biased results if the data does not really fit a bell curve. Statistics How To discusses non-normal distributions in this piece.
- Overfitting and underfitting. The problem of using models which are too complicated, or too simple, can easily occur with machine learning. The issues are described in some detail in these lecture notes by John Bullinaria of the University of Birmingham.