AI-generated scientific research is polluting the online academic information ecosystem, according to a worrying report PUBLISHED at Harvard Kennedy School’s Review of misinformation.
A group of researchers investigated the prevalence of research articles with evidence of artificially generated text in Google Scholaris an academic search engine that makes it easy to search for research published historically in many academic journals.
The team specifically investigated the misuse of generative pre-trained transformers (or GPTs), a type of large language model (LLM) that includes now-familiar software such as OpenAI’s ChatGPT. These models can quickly interpret text inputs and quickly generate answers, in the form of numbers, images, and long lines of text.
In the research, the team analyzed a sample of scientific papers found on Google Scholar that contained indications of the use of GPT. Selected papers have one or two common phrases that conversational agents (usually, chatbots) strengthened by the use of LLM. The researchers then examined the extent to which the questionable papers were distributed and hosted on the internet.
“The risk of what we call ‘evidence hacking’ will increase even more if AI-based research spreads to search engines,” said Björn Ekström, a researcher at the Swedish School of Library and Information Science, and co. who is the author of the paper, at a University of Borås release. “This can have tangible consequences because improper consequences can permeate society and possibly even more domains.”
Google Scholar’s method of pulling research from around the internet, according to the new team, does not screen out papers whose authors have no scientific affiliation or peer-review; the machine pulls in academic bycatch—student papers, reports, preprints, etc.—along with research that passes a higher bar of scrutiny.
The team found that two-thirds of the papers they studied were at least partially produced by the undisclosed use of GPTs. Of the papers produced by GPT, the researchers found that 14.5% were in health, 19.5% were in the environment, and 23% were in computing.
“Most of these papers produced by GPT were found in non-indexed journals and working papers, but some cases included research published in mainstream scientific journals and proceedings of conference,” the team wrote.
The researchers outlined two main risks that come with this development. “First, the proliferation of fabricated ‘studies’ that permeate all areas of the research infrastructure threatens to overwhelm the scholarly communication system and jeopardize the integrity of the scientific record,” wrote the group. “The second risk lies in the increased possibility that convincing-looking scientific content is actually fraudulently produced using AI tools and optimized to be retrieved by publicly available academic search engines , especially Google Scholar.”
Because Google Scholar is not an academic database, it is easy for the public to use when searching for scientific literature. That’s good. Unfortunately, it is more difficult for members of the public to separate the wheat from the chaff when it comes to popular magazines; even the distinction between a piece of peer-reviewed research and a working paper can be confusing. Besides, the AI-generated text was found in some peer-reviewed works as well as less-reviewed writings, indicating that the GPT-generated work muddies the waters of the entire online academic information system—not just work that exists outside of most official channels.
“If we can’t trust that the research we read is real, we risk making decisions based on incorrect information,” said study co-author Jutta Haider, also a researcher at the Swedish School of Library and Information. Science, in the same release. “But as much as it’s a question of scientific misconduct, it’s a question of media and information literacy.”
In recent years, publishers have failed to successfully screen some scientific articles that are in fact complete nonsense. In 2021, Springer Nature forced to withdraw over 40 papers on Arabian Journal of Geoscienceswhich despite the title of the magazine covers a variety of topics, including sports, air pollution, and children’s medicine. Besides being off topic, the articles are poorly written — to the point of being meaningless — and the sentences often lack a clear line of thought.
Artificial intelligence is exacerbating the issue. Last February, the publisher Frontiers caught flak for publishing a paper in his journal Cell and Developmental Biology which includes images created by AI software Midjourney; especially, kindness Anatomically incorrect images of signaling pathways and rat genitalia. Boundaries retracted the paper a few days after its publication.
AI models can be a boon to science; systems can decoding weak texts from the Roman Empire, search previously unknown Nazca Linesand reveal hidden details of dinosaur fossils. But the impact of AI can be as positive or negative as the person using it.
Peer-reviewed journals—and perhaps hosts and search engines for academic writing—need guardrails to ensure that technology works in the service of scientific discovery, not against it.