Student Researchers Probe the Vulnerabilities of ChatGPT

AI tools are fast becoming pervasive in medicine and other high-stakes fields, far outpacing workforce preparedness education. Marvin Slepian, director of the Arizona Center for Accelerated Biomedical Innovation, is responding with experiential learning that’s cracking open the black box of AI while immersing students in its uses.

In one experiment, student teams rely on different resources for information: ChatGPT, Google and library collections. “The hypothesis is that the mechanism of retrieval – the detailed prompts
of ChatGPT versus the simple query structures of Google versus sweat-of-the-brow library research – will generate different results,” Slepian says. “But what do those results have in common, what gets left out and how do they compare in terms of accuracy?”

Another experiment investigates the risk of AI amplifying misinformation. Students probe the thresholds at which repeatedly feeding ChatGPT wrong information gets it to return falsehoods as fact, outputs dubbed “hallucinations” in the world of AI.

Slepian’s students are forerunners in defining the parameters of those vulnerabilities while also proposing ways to make AI outputs more reliable, e.g., by annotating with sources or enabling crowd-sourced metadata about quality and accuracy.

“We have to be fast and agile,” Slepian says. “We can’t wait years to put together a grant to study these things.” As a physician-researcher, he sees incredible promise in AI tools but notes they’re being adopted at breakneck speed, despite being still poorly understood. “We need guidelines for these technologies, and we need them now.”

Undergraduates Jordan Rodriguez and Katelyn Rohrer are among Slepian’s researchers who presented new data on ChatGPT at the Biomedical Engineering Society conference in October 2023.

My initial impressions of generative AI tools were a touch catastrophic. I remember feeling
despair that, with simple prompting, they could replicate all the work I’m doing in college more easily and, in many cases, more competently.

After those impressions faded, I learned how to integrate ChatGPT into my toolset. I learned how powerful it could be – not in replacing my work but to explain a new concept or algorithm or programming language in a concise way.

I now firmly believe that these tools can be
used to greatly expand people’s ability to learn and am incredibly excited about their potential. They democratize individualized education.

My biggest concern is the potential for reliance in a new generation of knowledge workers. A new generation will have to resist the urge to offload all their work and learning to generative models.

Jordan Rodriguez
Computer Science & Spanish Undergraduate

I remember when the Chat GPT 3.5 model
came out, it seemed to be almost magical. However, the more I use the model, the more I can see the patterns in its responses. It becomes apparent that it works on certain templates based on whatever it’s asked.

ChatGPT is very proficient in finding specific
information and answering niche questions, but when using it, you need to maintain a certain skepticism about the responses since it does “hallucinate” several answers.

Our project is still in its preliminary stages, but
we tested whether ChatGPT could correctly affirm true statements and correctly reject false ones. We found that the model accurately affirmed true information ~91% of the time and accurately rejected misinformation ~83% of the time.

Katelyn Rohrer
Computer Science & Cybersecurity Undergraduate

Students — Gregor Orbino for The University of Arizona/Arizona Board of Regents

Data Connects Us Magazine