Quantcast

SC Connecticut News

Sunday, November 24, 2024

Study shows challenges in distinguishing human from AI writing

Webp pdb35xqzt3c00yieveg2mvly0a3u

Peter Salovey President | Yale University

Peter Salovey President | Yale University

A recent study led by Dr. Lee Schwamm, associate dean for digital strategy and transformation at Yale School of Medicine, has highlighted the challenges faced by peer reviewers in distinguishing between human and AI-generated writing. The research involved an essay contest for the journal Stroke, featuring submissions from both humans and large language models (LLMs) like ChatGPT.

The study revealed that when authorship was blinded, reviewers struggled to accurately identify whether essays were written by humans or AI. Interestingly, when reviewers believed an essay was AI-generated, they were less likely to rate it as the best on a given topic.

"This study is a wakeup call to editorial boards, and educators as well, that we can’t sit around waiting for someone else to figure this out," Schwamm stated. He emphasized the importance of developing policies regarding the use of AI in scientific manuscripts.

The experiment invited Stroke readers to submit essays on controversial topics within the stroke field. In total, 22 human submissions were received. Additionally, four LLMs—ChatGPT 3.5, ChatGPT 4, Bard, and LLaMA-2—each wrote one essay per topic. Although corrections were made to literature citations in AI essays to avoid detection based on errors commonly made by AI in references.

Reviewers from the Stroke editorial board were tasked with attributing authorship and rating essays for quality and persuasiveness. Surprisingly, they correctly identified authorship only half of the time. "It was like a flip of a coin," Schwamm remarked.

AI-generated essays received higher quality ratings than those written by humans. A multi-variable analysis showed that persuasiveness was the only factor independently associated with correct identification of AI authorship.

Despite their inability to distinguish between human and AI-generated content effectively, reviewers rated an essay as best in topic only 4% of the time if they thought it was written by AI. "The reviewers weren’t able to tell human- and AI-generated essays apart," Schwamm noted.

The findings suggest that as LLMs improve, peer reviewers may find it increasingly difficult to detect machine-written content while also revealing a bias against such content. This raises important questions about AI's role in scientific writing.

Some journals initially banned LLMs but later allowed researchers to declare their use of AI tools. "We have to fight the natural tendency to view the use of LLMs as unfair," Schwamm commented. He suggested that instead of seeing AI negatively, it should be viewed as a tool akin to spell checkers or word processors.

For non-native English-speaking researchers in the U.S., this technology could level the playing field positively. "I think it’s going to level the playing field in a good way," Schwamm concluded.

ORGANIZATIONS IN THIS STORY

!RECEIVE ALERTS

The next time we write about any of these orgs, we’ll email you a link to the story. You may edit your settings or unsubscribe at any time.
Sign-up

DONATE

Help support the Metric Media Foundation's mission to restore community based news.
Donate

MORE NEWS