Academics Spread Misinformation about Misinformation on the Web

What follows is the headline from a recent article written by Chris Stokel-Walker for TIME.

Cloudflare Is One of the Companies That Quietly Powers the Internet. Researchers Say It’s a Haven for Misinformation

Cloudflare is a content delivery network that often does a tremendous job of defeating DDoS attacks and capturing the precise moment when networks go offline. They provide an important service but nobody generates clicks (or secures research grants) by observing a business is doing a good job. That is why the most common story you ever read about Cloudflare is that they have not prevented people from visiting [insert the name of a website it is fashionable to be hysterical about here]. The pattern has been evident for almost as long as Cloudflare has been in business. This is because the thrilling emotions generated by a widespread moral panic cannot be dissipated by sensible individuals questioning why the private sector should routinely interfere with content that is legal. Nor can the mob’s fury be quelled by intelligently asking why governments do not take more responsibility for prohibiting reprehensible content when there are ‘liberal’ democracies where the police will devote time to harassing the author of a tweet that is lawful. It goes without saying that most of the people who criticize Cloudflare for not censoring content will unironically tell you net neutrality laws are vital to democracy because telcos would otherwise censor almost everything on the web.

There are occasions when it is appropriate for businesses to choose to censor content, such as protecting children from graphic sexual images. Judgement needs to be exercised when content is appropriate for some, but not others. But we have a common procedure we can all follow when there is content that nobody should see. That procedure is based in law. Societies have a way to formulate law, follow law, and enforce law. Most of the people engaged in criticizing businesses like Cloudflare want to sidestep the law because they want to exert outsize influence over what the rest of us may see. Sadly, some of these people are employed as academics, a vocation formerly associated with freedom of thought that is gradually transforming into a production line for conformists who are as lazy as they are dogmatic. When I read academic output I treat it like any other source of information. That makes me different to many professional journalists; unlike them, I have a zero-tolerance policy for bullshit found in academic papers just like I despise bullshit found anywhere else. When I read a journalist writing up a story about misinformation, and then find the research it quotes is itself based on misinformation, I feel compelled to call it out. Which brings me to today’s topic…

… a new study suggests that Cloudflare also plays an outsized role in propping up sites that peddle hate and misinformation. The May 2022 study from Stanford University researchers analyzed which services hosted 440 of the most prominent misinformation websites in the world.

They found that Cloudflare was a safe haven for toxicity.

I am not going to analyze anything but this excerpt from Stokel-Walker’s TIME article. If I can show these few sentences are riven with misleading assertions then it demonstrates we should always be wary of everything we read, even when reading an article that warns us not to be too trusting.

A New Study Suggests…

The study referred to by Stokel-Walker is documented in a paper entitled “On the Infrastructure Providers That Support Misinformation Websites” by Catherine Han, Deepak Kumar and Zakir Durumeric of Stanford University. It was published in the proceedings of the Sixteenth International AAAI Conference on Web and Social Media, an event held in June 2022. That means the publishing of this work is new. But not all the work is new. The single most influential input to this ‘new’ research is a list of supposed misinformation websites which nobody has updated since April 2017, and which was probably out of date even then. Five years is a long time on the web. The list of “440 misinformation and hate sites” that the Stanford researchers downloaded was actually a lot longer, but they were forced to ignore hundreds of websites on the list because they no longer exist.

While the OpenSources master list contains 826 websites, this list was published in 2017, and because of this, many of these sites are unavailable today.

The Most Prominent Misinformation Websites in the World?

You might expect Stokel-Walker to be circumspect about whether the websites reviewed really are “the most prominent misinformation websites in the world” given that the research ignored any website created in the last five years, including the specific website that Stokel-Walker complains about in five of the opening six paragraphs of his article. Anyone who impartially examines the so-called OpenSources master list will find it is comprised of the arbitrary bugbears of the kind of people who go to American universities. You will not find any misinformation sites written in Chinese, or French, or Swahili, because those are not languages spoken by the kind of people who study ‘global media’ at US universities. Nor does the list comprise misinformation sites because they were blacklisted by reputable authorities. These URLs are in this list because somebody in an American university was able to visit them, and so purposefully chose to visit them with the intention of being upset by their content.

What you will find on the list includes:

  • a UK newspaper, which means the accuracy of its content is regulated per UK law;
  • the website of a US politician who served 22 years in Congress;
  • The Intercept, an American news organization devoted to civil liberties that has won the National Magazine Award and multiple awards from the New York Press Club;
  • Wikileaks, the website created by Julian Assange, who was acclaimed by readers to be TIME Person of the Year in 2010.

Perhaps you would choose not to obtain information from these sources. Stokel-Walker would probably highlight other websites on the list that you might like even less. But this amateurish and erratic list of websites was used as the primary input to what was supposed to be academic-grade research. It is highly tendentious to smear websites like The Intercept as supplying misinformation, just as it would be wrong of me to describe TIME as a source of misinformation just because I can show Stokel-Walker wrote one incredibly crappy article based on some incredibly crappy research. And whilst many object to the way Wikileaks operates, it is absurd to complain their goal is to spread misinformation by publishing genuine documents that were leaked to them. Liberal arts professors in the USA dislike Wikileaks because they made Hillary Clinton’s emails public, but Clinton did not deny those emails were genuine.

The original source of the list of websites that is pompously and misleadingly described by the Stanford academics as ‘the OpenSources master list’ is Melissa Zimdars, an Associate Professor of Communication and Media whose most famous publication is… this list of websites. So what scientific method did Zimdars use to compile this list, per her own account in The Washington Post?

So this past Monday morning, I put together a resource for students in my media class, “False, Misleading, Clickbait-y, and/or Satirical ‘News’ Sources.” I populated it using some notes I’ve been taking in recent weeks, a site that recently tricked me with a too-good-to-be-true story about Aaron Rodgers, my observations of websites I follow on Facebook relying more and more on hyperbole and outrage to drive traffic, suggestions and resources provided by my own Facebook friends (many of whom are also media and communication scholars) and emails from strangers who stumbled on the growing list. Shortly after creating it, I set it to be visible to the general public. My own Facebook friends were already asking to share it with other teachers or professors whom I didn’t know.

I doubt that a genuine database of ‘the most prominent misinformation websites in the world’ would be influenced by one person being fooled by clickbait about American football quarterback Aaron Rodgers. Zimdars’ list is neither authoritative nor credible. And nobody subsequently performed a credible or authoritative edit of the websites listed by Zimdars. A list which reads like a stereotypical list of bugbear websites for an American liberal arts professor is like that because it really is a list of bugbear websites casually thrown together by an American liberal arts professor. It was compiled in a hurry, with no great thought, and was taken down by Zimdars a short while later. The only reason the list is still on the web, and hence accessible to the Stanford researchers, is because some unimportant computer geek that works in the private sector decided to copy Zimdars’ list to the GitHub code repository.

There are services which seek to comprehensively catalog web content so that ISPs can implement effective filters, whether those filters are prescribed by law or are voluntarily chosen by customers. Such catalogs include sources of hate, alongside those for pornography, violence and so on. But the Stanford researchers were too cheap to pay for that quality of information, so they based their work on Zimdars’ list instead. They referred to the ‘OpenSources Project’ to obfuscate their reliance on somebody’s copy of a hand-me-down list that Zimdars distanced herself from soon after she compiled it. Zimdars’ desire to play down the importance of the list becomes even more understandable when you review the goofy list of justifications she used to determine which websites she added. The Stanford researchers deliberately rewrote those rules to make them sound like they might have been applied systematically. The original wording of Zimdars’ rules illustrates why that would be impossible. Those rules include the following:

  • Writing Style Analysis. Does the website follow AP Style Guide or another style guide? Typically, lack of style guide may indicate an overall lack of editing or fact-checking process.
  • Aesthetic Analysis. Like the style-guide, many fake and questionable news sites utilize very bad design. Usually this means screens are cluttered with text and heavy-handed photoshopping or born digital images.
  • Source Analysis. Does the website mention/link to a study or source? Look up the source/study. Do you think it’s being accurately reflected and reported? Are officials being cited? Can you confirm their quotes elsewhere? Some media literacy and critical scholars call this triangulation: Verify details, facts, quotes, etc. with multiple sources.

A process that tries to distinguish information from misinformation based on writing style and aesthetics is not going to be a rigorous process. However, I would agree that source analysis is pertinent. Source analysis is exactly the kind of technique I favor when checking articles I read. It is the technique I am using now, demonstrating that TIME published a sensational article that depends too much on one weak academic paper that did not accurately describe the true nature of the ‘OpenSources Project’ which was the most important influence on the findings it presents.

Misinformation, Hate and Toxicity

The Stanford researchers are as guilty as Stokel-Walker in seeking to conflate hate with misinformation solely to amplify the attention their work would receive. The following quote is taken from the Stanford paper:

…we crawl and analyze the network dependencies of 440 misinformation and hate speech websites — which because nearly all hate speech websites also spread misinformation, we refer to in aggregate simply as misinformation websites.

This is misleading. At one point the paper breaks out the analysis of hate websites as a proper subset of all misinformation websites. But Zimdars’ list included very few websites that were also tagged as spreading hate. The Stanford authors refer to ‘misinformation and hate’ on six different occasions, ostensibly so they can frame their conclusions as relevant to both. They do this despite their list only containing 30 websites tagged by Zimdars as spreading hate, compared to 410 websites which are described as presenting misinformation but which do not spread hate. There is only one plausible reason to repeatedly refer to ‘misinformation and hate’ websites when reviewing a sample of websites where hate covers less than 7 percent of the total. That is to enable journalists like Stokel-Walker who want to engage and motivate readers by exaggerating the extent of uncensored hate found on the internet. This can be used as the thin end of a wedge that begins by insisting businesses like Cloudflare must do more to censor hate, but soon pushes them to censor websites solely because they challenge the opinions of most academics and journalists.

Stokel-Walker knew he would bolster his arguments by asserting Cloudflare is a ‘safe haven for toxicity’ but this generalization is drawn from a triflingly small selection of the total number of websites that use Cloudflare’s service. Of the 30 sites arbitrarily listed as purveyors of hate per Zimdars’ list, the Stanford researchers found just nine that also use Cloudflare. Over 7.5 billion websites use Cloudflare’s services. To suggest that 9 websites out of 7.5 billion proves Cloudflare is a ‘safe haven’ for toxicity is to layer hyperbole on top of hyperbole. These are the tactics employed by unreliable commentators seeking to manufacture a moral panic in order to bypass the law and pressure private companies into censoring content more heavily.

I have made my argument, and I know some people will not like it. There are many ways that journalists and academics would like to silence individuals like you and me, if we dare to criticize their work. One of those methods is to assert that the content on a website like this cannot be reliable unless it conforms to the Associated Press (AP) style guide; I mostly follow the Chicago style guide, but not religiously, and I do not uniformly impose it on others who submit articles to Commsrisk. Another way to sideline people like you and me is to insist that anyone employed as an academic at a university must be the real expert, no matter how shoddy their work, and that everybody else can only be an ill-informed amateur by comparison. I hope my research into the process used to compile the ‘OpenSources master list’ makes you wary of such assumptions, especially as the Stanford research team volunteered that this list “has been used extensively in prior research”.

Please pause and think before you recycle articles you see on the web. The content might not be reliable, even if it was published by an outlet with as strong a brand as TIME or is supposedly citing academic research. You might want to do your own research, even if the article’s conclusion supports your existing opinion. We can all strive to be open to considering multiple sources of information. To do that, we must have the freedom to check them all. The reason we can find flaws in the work of Stokel-Walker, Han, Kumar and Durumeric is because people like them are not currently able to dictate to businesses like Cloudflare, no matter how much they would like to.

Eric Priezkalns
Eric Priezkalns
Eric is the Editor of Commsrisk. Look here for more about the history of Commsrisk and the role played by Eric.

Eric is also the Chief Executive of the Risk & Assurance Group (RAG), a global association of professionals working in risk management and business assurance for communications providers.

Previously Eric was Director of Risk Management for Qatar Telecom and he has worked with Cable & Wireless, T‑Mobile, Sky, Worldcom and other telcos. He was lead author of Revenue Assurance: Expert Opinions for Communications Providers, published by CRC Press. He is a qualified chartered accountant, with degrees in information systems, and in mathematics and philosophy.