Here’s a riddle for you:
What is often used to help us search (typically accompanied by a human handler), because it has a sometimes super-human ability to sniff out traces, and to find what we are unable to detect; is often described by humans as ‘clever’ and ‘smart’ but in fact has no powers of reasoning whatsoever; is often hopelessly, sometimes comically, mistaken by human standards; and is unable to explain anything it does?
The answer I’m thinking of is not “man’s best friend” or any other animal. It doesn’t, in fact, share anything with us in terms of mutual connection to a shared world – of walks, of lampposts and trees, of swimming, of food, of giving birth or raising the young.
The answer is so-called artificial intelligence (AI). Statistical artificial intelligence, to be exact, or machine learning (ML). The specific ML technology – whether it’s, for example, a deep artificial neural network or a Bayesian network – doesn’t matter for the purposes of this article.
I’ve been looking for a way of putting statistical AI’s actual capabilities – as opposed to all the hype – into perspective. And I think I’ve found it. Statistical AI is like a dog. To be fair, it’s a dog that can beat the world’s best grandmaster at the game of Go, which amounts to more than finding your slippers. However, fundamentally, machine learning is far more akin to an olfactory (smell) technology, than a cognitive or intelligent technology.
Human intelligence involves reasoning. It’s applicable to different problems and different contexts. Arguably, it needs to be able to explain its own processes. “I just know the answer is 74 but I can’t explain it” is an intuition, not the application of intelligence.
By those criteria, statistical AI is not intelligent but olfactory in the sense that it ‘sniffs’ patterns out of data, rather than reasons about it. Deep Mind’s AlphaGo program sniffs the next move from the available data of the board state. A program that ‘recognises’ cats in a collection of pictures sniffs them out from the pixels.
A statistical characteristic of data is akin to a scent, not a statement in any calculus. A program that is trained to detect that characteristic is, essentially, detecting a scent in the data.
The thing about a scent is that it has no (denotative) meaning – it’s not like a word or other symbol – and it’s impossible to say how we know it for what it is – for example, that it’s the smell of grass, of rain, of toast. It just is. You can produce a chemical analysis of newly cut grass but you still wouldn’t be able to say how such and such a combination of chemicals produces what is, symbolically speaking – or mentally speaking, if you like – that particular smell.
Divination without meaning or deduction is the way statistical AI – that is, machine learning – works. It’s called a ‘black box’ technology, because its workings are opaque.
The consequences of using an olfactory as opposed to intelligent (symbolic, logical, mathematical) technology are:
- No error model. It’s not possible to say how often it’s going to be wrong or how wrong it’s going to be. It’s logically impossible to know these things because, while the statistical model may have mathematical properties, the application of those statistics to an unknown set of data – data with unknown logical properties – does not.
- No explanations. Whether right or wrong in a particular instance, it’s not possible to say why a statistical AI was right or was wrong. There is no sequence of deductive steps we can point to, verify, discuss, audit. I’ve read of some approaches to creating an account of what a statistical AI did, but they’ve all been bogus – like getting another statistical AI to work it out (sic).
- No applicability across domains. Statistical AI is applied to specific domains: games of Go, cats in images on the internet, etc. You could try Deep Mind’s AlphaGo program on cat pictures, but it almost certainly wouldn’t work. There is no conceptual framework, no internal structure, providing a bridge from one to the other.
In fact, it’s worse than that. To continue the dog analogy, an actual dog is something we know quite a lot about. It inhabits our world of sensory and bodily existence. Dogs can behave unaccountably and strangely – even dangerously – but we understand their limitations well. They’re bound by physics and biology.
Statistical AI, on the other hand: well, let’s say that to the extent it’s like a dog it’s an alien dog. A precocious, alien dog that smells alien smells on Earth, not the ones we know. If you put this precocious, alien dog into a domain to train it, you’ve no idea what kind of ‘nose’ it will develop, for what kind of smells. And you’re easily misled. Let’s suppose you train it to smell for people. Quite a few times it finds people, and you’re impressed. But then you discover that it’s ignoring black people, because it ‘smells’ them as gorillas, not people. In all essential respects, that’s a real example applied to images on the internet. And it stems from the fact that this precocious alien dog has no reasoning power for understanding what we mean when we train it; it knows nothing of life on Earth beyond the training data; and it is not bound by any of the principles we (seek to) live under. It shares no relationship to our world as a real (terrestrial) dog would: it knows only the data it’s been given, data with no meaning, only a latent smell which it itself has conjured into existence.
For the reasons given above, statistical AI is not suitable for any serious engineering use where errors have significant implications. I would never, for example, get into a driverless car that used it for any purpose affecting safety.
And, with an unknown error model, it’s not going to replace any but entirely procedural jobs of relatively little significance. It’s even specious in many cases to say that it can help experts. Is a doctor going to accept statistical AI’s ‘smell test’ of an individual patient’s data without closer human examination in every case, whether the result is positive or negative? Would cancer specialists pay complete heed to the alleged cancer-detecting ability of (real) dogs alone, again whether positive or negative? Nope. You can apply the same to law, or commercial translation, or any domain where errors matter.
Which is not to deny that job loss through automation is real, although a relatively small proportion of it is achieved through AI of any shape or form.
Statistical AI is mainly a tool for applications where a statistically advantageous result is judged to outweigh the harm of the individual mistakes that have been encountered so far. (We can’t know what future errors will be made, at what cost.)
Driving its development is what we might call statistical capitalism: capitalism which exploits statistical effects. This has been the basis of gambling for centuries but has burgeoned in the digital age. The first (digital) statistical capitalists didn’t care if a particular choice of advertisement or a choice of product recommendation was absurd from the perspective of human judgement (“You’ve watched Girl on a Motorcycle, you might like this Hayne’s manual for the CB500”). A statistical capitalist only cared if, overall, they made a profit by tweaking an essentially zero-cost technology which, on average, led to sufficient sales.
As the AI mindset takes hold, the set of things statistical capitalists don’t care about is growing. They don’t care if their taxi algorithms operate with a concomitant reduction in drivers’ rights. Or if their automated querying system is inadequate for your question, given that the savings from making their human team unemployed outweighs the negative effects of bad customer service in certain cases.
They don’t care if their human resource software is prejudiced against minority and/or female job applicants and thus unfairly eliminates them from interview, since it save the company from paying human HR operatives to go over CVs. And the white male staff they end up with seem to meet the (low, I would argue) expectations within their culture.
Where does this leave us?
Every day we read that digital computers play chess, translate languages, recognize patterns, and will soon be able to take over our jobs.
The reader could be forgiven for taking this to be a statement from 2017, particularly if we were to substitute the game Go for chess. In fact the statement was made 45 years ago in 1972, by Hubert L. Dreyfus in What Computers Can’t Do, his critique of the claims then being made for AI – which was what is known as symbolic or semantic AI, not statistical AI. Whatever one thinks of his argument, Dreyfus correctly observed that the claims and predictions of his day were greatly exaggerated. Nonetheless, the 1980s saw new inflated claims for the capabilities of the next big thing in AI, ‘expert systems’. I recall reading that my doctor was to be replaced by a machine any minute. After all, doctors simply follow the rules they’ve been trained in for diagnosis and treatment – don’t they? – and computers are good at rules. I also recall reading in the 1980s that a new technology called artificial neural networks was the breakthrough that AI needed. Despite these pronouncements, the technologies of that decade were for the most part quietly dropped. The ‘AI winter’ ensued.
Now we’re told that we’re basking in the warming sun of (another) AI spring.
I’m arguing that, in epistemological terms (i.e. advancing the state of our knowledge), we’re still pretty much in winter, and nothing’s going to change any time soon. We’re no closer to an emulation of human cognition, however ‘deep’ the neural networks of the 1980s have become in the 2010s. There simply is no theoretical basis for thinking otherwise. Anyone who has observed child development knows that humans are not ‘statistical’ learning beings. We have innate structures that enable us to learn language, in particular, in very few steps considering the magnitude of the achievement – many orders of magnitude fewer than the number of steps required by machine learning for far lower capabilities.
To observe that no theory accounts for human intelligence from statistical processes is not to say that there are no underlying statistical processes in how we learn and function as intelligent beings. But we’re analogue, not digital. Simulating a nuclear power station in a computer doesn’t make that computer a nuclear power station. We’re analogue, and we evolved from molecules over billions of years.
What about pragmatism, though? Surely machine learning just works for practical purposes; look at the progress in automated translation, for example. No, sometimes it works much of the time, but we don’t know why and when it’s wrong it can be arbitrarily wrong. Epistemologically speaking, we are no further along. There seems to be a tendency to report only successful cases and then to extrapolate implicitly (and falsely). Why don’t researchers write papers about when machine learning doesn’t work? Because there would be nothing interesting to say. “I used algorithm A with parameter set P and it didn’t work for data D” tells us nothing, because no significant implications can be drawn. “I used algorithm B and parameter set Q on data E and it happened to work X% of the time” tells us nothing, either, scientifically speaking, about the semantics of the domain (i.e. beyond the existence of a statistical relationship).
It’s not of any great concern to me that many claims made for statistical AI will once again be seen to be greatly inflated. I’m more worried about the statistical capitalists. Because for them, I’ve argued, statistical AI is just fine: they don’t care about its flaws as long as they make a net profit. But while sometimes those flaws are merely irritating, sometimes they lead to injustice and other serious consequences that the rest of us do care about.
It is becoming generally recognised that, as in the cases I’ve cited above, the statistical AI dogs sometimes have a biased sense of smell, one which is an affront to social justice. Some of the prejudices in AI software have been uncovered. But how would one keep up with them, test them all for bias? How should we regulate their use?
While statistical capitalism runs on apace – recently through to algorithmically-generated political messages and killer robots – the rest of us should be calling it to account.
Statistical AI works, to the extent that it does work, only when exposed to very large amounts of data. Our data. Do we want to be giving up our data so freely to the likes of Google, Facebook and the other monopolies – only for them to sell the scents they detect in it to other statistical capitalists, without checks or constraints?
Photo by Crkuberan (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons