News Link • Robots and Artificial Intelligence
ChatGPT passed the Turing Test. Now what?
• https://www.popsci.com, Tom HawkingIt seems that every day brings a new headline about the burgeoning capabilities of large language models (LLMs) like ChatGPT and Google's Gemini—headlines that are either exciting or increasingly apocalyptic, depending on one's point of view.
One particularly striking story arrived earlier this year: a paper that described how an LLM had passed the Turing Test, an experiment devised in the 1950s by computer science pioneer Alan Turing to determine whether machine intelligence could be distinguished from that of a human. The LLM in question was ChatGPT 4.5, and the paper found that it had been strikingly successful in fooling people into thinking it was human: In an experiment where participants were asked to choose whether the chatbot or an actual human was the real person, nearly three of the four chose the former.
This sounds…significant. But how, exactly? What does it all mean?
What the Turing Test is—and what it isn't
To answer that question, we first need to look at what the Turing Test is, and what it means for an LLM to pass or fail it.
Cameron Jones, a postdoctoral student at UC San Diego and one of the co-authors of the new paper, explains that Turing introduced the idea of the test in his seminal 1950 paper "Computing Machinery and Intelligence." The paper set out to address a big, fundamental question that occupied the minds of Turing's contemporaries: "Can machines think?"
In his paper, Turing quickly rejects the question as being ambiguous and non-rigorous, because it is not clear on what either a "machine" is in this context, nor on what "thinking" is. He argues that a more nuanced and easily addressable question was required, and proposes, "Can a machine act in such a manner that its actions are indistinguishable from those of a human?" To answer this question, he proposes what he calls "The Imitation Game," and it's this exercise that has since come to be referred to as simply "The Turing Test."
The test involves one person—the "interrogator"—communicating simultaneously with two hidden interlocutors, referred to as the "witnesses." All communication is written. The twist is that while one of these witnesses is a real person, the other is a machine. The point of the game, Jones explains, "is for the interrogator to figure out which of the two witnesses is the real human."
Jones and his team performed this experiment with four LLMs. ChatGPT 4.5 was by far the most successful: 73% of participants identified it as the real human. Another model that goes by the unwieldy name LLaMa-3.1-405B was identified as human 56% of the time. (The other two models—ELIZA and GPT-4o—achieved 23% and 21% success rates, respectively, and will not be spoken of again.)



