A Big Problem with AI Search Results

I write every day. Most days I need to lookup information so that I am providing support for my writing. To assure the reader that I’m not just making stuff up, I like to provide references for my comments. I am, after all, an academic and, as such, it is incumbent on me to provide accurate, valid information. References provide evidence of validity.

Way back in the day doing a digital search required some fairly advanced skills in building a search query that would give you the results you needed. Today, thanks to advanced search engines like Google, even a sloppy, half defined search prompt will give you the results you’re looking for. The challenge, though, is determining whether the results are truly valid. That is, the search engine delivers links to websites and other resources that answer the question, but it is often not clear whether those resources are trustworthy. They could be bullshit. I don’t know if you’ve noticed but there’s a lot of that online.

And this presents two problems.

First, there is the question, again, of the validity of the resources returned by the search engine. Trying to determine whether the information provided on a given webpage is correct is sometimes (perhaps often?) rather challenging. It’s easy to become complacent and simply take the author at his/her word rather than try to validate their claims. This is, at least in part, why we see so many conspiracy theories: People don’t take the time to question what is presented to them.

The second problem is that more and more often search engines are returning not only links to various websites but are actually providing ‘answers’ to the question posed in the search using AI. Why is that a problem? Well, AI generates its responses based on the available websites. The probability that bullshit information is incorporated into the AI response is, I think, fairly high. It’s the old “garbage in, garbage out” principle. If we can’t confirm the validity and accuracy of a website’s content (as noted in the first problem above) how can we ensure that the AI generated results don’t include that bullshit content? Now, the AI folks contend that they’ve taken steps to minimize this issue, but have they, really? How can we be sure? How can THEY be sure, given that apparently no one really understands how AI works.¹

These AI generated responses can be helpful in quickly and fairly concisely providing useful results to a search question but what is missing are the links to the sources from which those responses were generated. You see, when an academic or researcher makes a claim in their writing, they must back up that claim with evidence. That evidence often comes in the form of links or footnotes back to the literature or data on which the claim relies. This provides the reader with some degree of assurance that the claim is based on valid information. It also allows the reader to go to those sources and confirm for themselves that the referenced resource does, in fact, support the claim. There have been occasions in my own experience where the referenced resource did not support the claim. Perhaps the author misinterpreted the data or they twisted it to their needs, or they conveniently ignored the parts that didn’t support their position. Or, in at least one case that I encountered, they provided a resource that clearly was unrelated to the claim with, I suspect, the hope that the reader, seeing the reference, would assume it accurate without actually investigating it. And, apparently, they were correct as quite a few supposedly well educated educators failed to question the unsupported claim!

When doing my own literature reviews I often find it helpful to take a look at the resources the author of a given article used in their writing. This not only allows me to validate their comments but provides me with additional resources to review in preparing for my own writing. After all, if I found the author’s perspective useful, surely the resources they used would be useful to me, as well. And this is what is missing from AI responses.

AI generated responses are not necessarily bad but they do need to be vetted for accuracy. The only way to properly accomplish this is by reviewing the sources upon which those responses are based. Those responses should always include the references that were used to generate that response. Not only will this allow the reader to validate the accuracy of the interpretation but will provide them with resources they can reference and include in their own works.

Hopefully this problem will be rectified at some point in time (hopefully sooner rather than later). But in the meantime, I will try to avoid relying on the AI generated responses and do my own digging to ensure that what I write is accurate and not rooted in bullshit.

(https://www.technologyreview.com/2024/03/05/1089449/nobody-knows-how-ai-works/) ↩︎