Mon. Dec 23rd, 2024
Study Reveals Patterns That Expose Machine Generated Text

Next Article

Approximately 15% of research papers published in non-English-speaking countries are processed by AI.

What is the story

Recent studies have shown that large-scale language models (LLMs) like ChatGPT often over-represent certain words and may have a limited vocabulary.

The researchers likened this “excessive use of language” in the biomedical literature to the way doctors measure the impact of COVID-19 through “excess deaths.”

The study suggests that approximately 10% of abstracts in 2024 were processed by LLMs.

The LLM’s unprecedented impact on scientific language

The researchers noted that the impact of LLM use on scientific writing is “truly unprecedented, surpassing even the dramatic vocabulary changes caused by the COVID-19 pandemic.”

The researchers took a novel approach to measuring “excessive word use” in biomedical literature, similar to the way doctors track “excess deaths” in epidemiology.

The study conducted an in-depth analysis of 14 million biomedical paper abstracts published between 2010 and 2024.

LLM increases frequency of certain words

The research team used papers published before 2023 as a baseline and compared them to papers published during the widespread commercialization of LLM.

They found that there was a 25-fold increase in the frequency of less common words like “delves,” and a nine-fold increase in the frequency of words like “showcasing” and “underscores.”

Even common words like “potential,” “discovery,” and “important” saw increases in usage of up to 4%.

Excessive use of language: an indicator of AI’s impact

The researchers looked at overused words and phrases between 2013 and 2023 and identified terms related to global events, such as “Ebola,” “coronavirus,” and “lockdown.”

However, in 2024, the extra words were mostly style words rather than content words.

Of the 280 overstyle words identified that year, two-thirds were verbs and about one-fifth were adjectives.

Using these excessive style words as an indicator of ChatGPT’s usage, the researchers estimated that roughly 15% of papers published in non-English-speaking countries such as China, Taiwan, and South Korea are now processed by AI.

This compares with 3% in English-speaking countries such as the UK.

They acknowledged that native English speakers may be better at hiding their use of LLMs.