As a result, Dmitry Kobak and his team have produced a wonderful and pioneering study. First, Smith and Kominer identified important trends in the use of large language models (LLMs) in biomedical publications. That’s the focus of a new study in Science Advances that explicitly examines the unnecessary vocabulary that fills scientific writing. It points to a concerning trend in writing efforts that emerged parallel to the introduction of AI-assisted solutions.
According to this study, prior to 2024, nouns accounted for 79.2% of all unnecessary excess words. Verbs made up 66% and adjectives just 14%. The team of scientists saw a dramatic change in tone as the new year neared. This points to the powerful role that AI tools are currently playing in influencing how researchers convey their discoveries.
Shift in Writing Patterns
The retrospective analysis performed by Kobak et al. draws attention to stark shift within the biomedical literature around the advent of 2024. The team noted that at least 13.5% of all papers published this year included some form of LLM processing. This is an almost 4-fold increase over recent years. It’s a remarkable demonstration of the pace of AI technologies’ adoption into scholarly writing.
Upon closer analysis, the authors of the study split the superfluous words into content words and style words. This method afforded a more holistic picture of how LLMs are changing the landscape of scientific discussions. The researchers determined frequency ratios (r) and identified terms with an r greater than 90. In addition, they found delta values (δ) greater than 0.05. These metrics offered a helpful overview that will support ongoing research into the impact of AI on scholarly publications.
Variations in LLM Usage
Kobak’s team further found significant discrepancies between how LLMs are used across different fields of research, countries, and venues of publication. This form of variation indicates that all disciplines or geographical areas do not embrace or implement AI technologies at the same rate or level.
Getting smart on these distinctions is imperative. They help to illustrate larger issues and trends in the academic community and raise key questions about what the future of scientific communication should look like. These insights suggest in-demand fields as the first to jump on the bandwagon of AI tools. Many are taking a step back, concerned about the biases these new technologies may perpetuate.
“…can introduce biases, as it requires assumptions on which models scientists use for their LLM-assisted writing, and how exactly they prompt them.” – Authors of the study.
Challenges in Distinguishing Authorship
Advanced AI tools such as ChatGPT and Google Gemini are quickly getting the hang of producing prose that passes for human. In practice, this means that it is getting harder to distinguish human-generated content from AI-generated or -alter content. This essential blurring of lines raises important ethical questions about originality and integrity in scientific writing.
The moral responsibility to conduct this work is present but researchers need to be aware of how these technologies impact their research. If we lean too far on LLMs, we risk the language and ideas producing homogenized output. This trend would continue to discourage creativity and innovation in the scientific community.