Turing test still filter artificial intelligence bots
What is consciousness? Can an artificial machine really think? Is the mind composed of neurons in the brain only? Or is there an intangible spark at its core?
To many people, these were key considerations for a future of artificial intelligence, but British computer scientist Alan Turing decided to overlook these questions with a much simpler question:
Can a computer talk like a human?
This question leads to a concept of measuring artificial intelligence that will become popularly known as the Turing Test.
The turing test
In the 1950 paper, “Calculators and Intelligence,” Turing put forward the following game:
A human judge would text a non-seeing participant and evaluate their responses. The computer must be able to replace one end to pass the test without actually changing the results.
In other words, a computer will be considered intelligent if it is difficult to distinguish its conversation from a human conversation. Turing predicted that by the year 2000, machines with 100 megabytes of memory could easily pass his test.
But he may have pre-empted events. Although today’s computers have much more memory than that, few have passed the test, and those that have performed well have focused more on finding clever ways to deceive judges than relying on large computing power.
A few successes on the turing test
Although she was never given a real test, the reason for the success of the first program was called “Eliza”.
Using only a simple and quite short text, he managed to mislead many people by imitating a psychiatrist, encouraging them to talk more, and doing the opposite of their questions to ask them.
Another text called “Barry” took the opposite approach by imitating a schizophrenic and insane patient who kept directing the conversation to his preprogrammed thought.
Bots are taking advantage of the turing test weakness
Their success in deceiving people highlighted one of the test’s weaknesses.
People usually attribute intelligence to a whole bunch of things that aren’t really intelligent. However, snoop competitions such as the “Lubner Prize” have made the test more formal by placing judges who already know that some of their speakers are machines.
But although the quality has improved, many conversational programmers use strategies similar to ‘Eliza’ and ‘Barry’. The 1997 winner, Catherine, manages to have an amazingly witty and focused conversation, but mostly if the judge wants to talk about Bill Clinton.
Another new winner was given to Eugene Gustman, the character of a 13-year-old Ukrainian boy. The judges interpreted the inconsistencies and strange grammar he followed as linguistic and cultural barriers.
Modern bots use the available chat data
Others, such as Cleverbot, have taken a different route, statistically analyzing huge databases of real conversations to determine the best responses.
While some also stored memories of previous conversations in order to improve them over time. But while Cleverbot’s individual responses sound very much like human responses, its lack of firmness of character and its inability to deal with new topics are obvious disadvantages.
The turing test is still relevant
Who would have expected in Turing’s day that today’s computers would be able to drive spacecraft, perform precise surgeries, and solve gigantic equations, but still struggle with the most basic and simplest conversations?
It turns out that human language is a surprisingly complex phenomenon that cannot be quantified by even the largest dictionary.
Chatbots may get confused by simple commas such as “umm” or questions that do not have correct answers.
A simple conversational sentence such as, “I took the juice out of the fridge and gave it to him, but forgot to check the date” would require a wealth of tacit knowledge and intuition to analyze it.
It turns out that simulating human conversation requires more than just increasing memory and processing power, and as we get closer to Turing’s goal, we may finally have to deal with all those big questions about consciousness.