- James Brady
In a recent post on LessWrong, Andrew Critch imagines scenarios in which networks of seemingly well-behaved AI systems nonetheless results in humanity's demise. In it, he mentions:
… [both stories] follow a progression from less automation to more, and correspondingly from more human control to less …
The comment is only made in passing, but it seems (a) interesting and (b) important: is this really the case? Does an increase in automation always result in a decrease in human control?
If our definition of "control" is that humans should complete every task, then yes: increasing automation decreases control.
However, I'd argue that it's possible for us humans to act upon complex systems at a higher level of abstraction, and that although we have less hands-on control at the object level, this meta-level work confers more control to us as agents.
There are many examples of emergent behaviour in science, where we find it useful to think in terms of abstractions rather than the object level:
- Newtonian mechanics. Allowed us to model the solar system, without knowing about atoms.
- Statistical mechanics. Allowed us to model gases, without saying anything about individual molecules.
- Evolution by natural selection. The maths of which would apply to any lifeform which reproduces with random variation: alien or terrestrial, carbon- or silicon-based.
We also see the same pattern in daily life:
- Delegation in a company. The CEO asks her team to complete projects, without knowing or caring what each individual action ends up being. This allows her to focus on hiring, strategy, fund-raising, partnerships, …
- Labour-saving gadgets in the home. A 1950s housewife would have been responsible for cooking, cleaning, childcare, grocery shopping, and a thousand other things. However, she probably wouldn't have had the benefits of a vacuum cleaner, a washing machine, a tumble dryer, or a dishwasher. Certainly she wouldn't have had a microwave or an app to get ready-made meals delivered to the front door. With modern automation, women are more able to make other choices, like joining the workforce to become more independent. Perhaps she continues her education or is simply more able to socialise. Perhaps she becomes the CEO in the previous point.
In both of these examples, abstraction and automation have removed tasks from a human's to-do list, but as a result they gain meaningful control over their work and their lives.
So everything's OK then?
In Critch's stories, two key things happen as automation is increased:
- We need to start worrying about how well-aligned each AI system is with our values
- The AI systems start communicating between themselves
1. The alignment problem
Critch's stories are (knowingly) optimistic here, in that he basically assumes that we have solved the single/single alignment problem. Specifically, each AI system is assumed to have been created by careful, well-meaning humans to serve sensible, limited goals, and that they're pretty well-aligned to their creators' goals.
CAIS is an example of an attempt to better-define such a outcome – where AIs exist as a suite of tools for us to choose from.
Unfortunately, as AIs become more capable, even anodyne goals can result in extremely undesirable behaviour. For example, Bostrom's instrumental convergence thesis, or Gwern Branwen's arguments that even limited tool-style AIs are incentivised to start acting like agents.
2. Unfettered communication between agents
Even after assuming that the agents are well-aligned with their creators' preferences, the increasingly inscrutable interaction between automated agents is disempowering for humanity.
Chattering AI systems can negotiate complex multi-party decisions faster than we can comprehend, due to increased communication bandwidth and efficient encoding of information.
Additionally, although we might have some model about what our agent's goals are, the same won't be the case for the other AI agents in the environment. Being able to predict – and therefore control – what the overall complex system does would be extremely challenging with this constraint.
It seems like somewhere between household gadgets and an AI-driven global economy, things went awry.
Where did the wheels come off?
There isn't a bright line between a world in which more automation would emancipate humanity, and one in which more automation would enslave humanity (or worse).
This unprecedented phase transition could be what gives rise to flippant dismissals of AI safety risk. "I'm not worried that my robot vacuum is going to take over the world", someone might say.
And why wouldn't they! The last 150 years of technological progress and increased automation have led to radical improvements in life expectancy, moral enlightment, equality of opportunity, wealth creation, and medical treatments.
It is exactly this recent history which lulls us into an implicit expectation that more automation will necessarily lead to futher gains, at our peril.