What would a robot value? An analogy for human values – part 4 of the Valuism sequence

By Spencer Greenberg and Amber Dawn Ace 

This post is part of a sequence about Valuism – my life philosophy. This post is the most technical of the sequence. Here are the first, second, third, and fifth parts of the sequence.

Image created using the A.I. DALL•E 2

This is the fourth of five posts in my sequence of essays about my life philosophy, Valuism – here are the first, second, third, and fifth parts (though the last link won’t work until that essay is released).

I find robots to be a useful metaphor for thinking about human intrinsic values (i.e., things we value for their own sake, not merely as a means to other ends).

Imagine that you’re programming a very smart robot. One way to do this is to give the robot an “objective function” (or “utility function”). This is a mathematical function that takes as input any state of the world and outputs how “good” that state of the world is. Suppose that the robot is programmed so that its goal is to get the world into a state that is as good as possible according to this objective function.

Imagine that, in this particular case, the robot’s objective function is separable into different distinct parts: i.e., the robot cares about multiple sorts of things. We can think of these as the robot’s intrinsic values. For instance, maybe part of its objective function says that it is good to help others, another part of its objective function says it is bad to cause pain to others, and a third part says that it is bad to deceive others. Now we can describe the robot’s intrinsic values as being “help others,” “don’t cause pain,” “don’t deceive,” and so on.

It may be the case that the robot would form intermediate goals (such as “open that door”), but the goals would ultimately be oriented toward its intrinsic values (e.g., the goal of helping others).

A neural net could be used to control the robot’s behavior, learning (based on the consequences of each action it takes) to predict which actions will lead to a good world rather than a bad one (according to its objective function).

Unlike this robot, we as humans don’t have a utility function that describes what we care about. That means that with humans, things are way more complex. But if we consider the simpler case of a robot, it can help us observe a few interesting and important things about how our own (human) values work.

1. Knowing the origin of our intrinsic values doesn’t change them 

If this robot were smart enough to one day figure out that it has an objective function, even if it figured out part or all of what that objective function is, that doesn’t mean it would stop caring about what its objective function says is valuable. Even if it knew that a human programmed it to have that objective function, its values would stay the same: after all, the objective function describes (precisely and completely) what the robot cares about.

Similarly, if the robot discovered one day that its creator’s motives did not resemble the objective function the robot was programmed with, that doesn’t make the robot suddenly have the same objective function as its creator; it merely knows more about why it has the objective function that it does.

We humans are, to a shocking degree, in the same situation as this robot. Our intrinsic values are determined through some combination of genetics (honed by evolution), our upbringing, adult life experiences, culture, and our reflection. If we figure out what caused our intrinsic values to be what they are, that doesn’t stop us from valuing those things! We are, in a sense, a kind of mesa-optimizer. Evolution, which is itself an optimization process, created us. We ourselves are, to an extent, trying to optimize. What we were optimizing for is not totally unrelated to the optimization process of evolution (which selects for whatever helps genes propagate), but it is also not the same as that optimization process (otherwise, everyone would want to be constantly donating their sperm or eggs to sperm/egg banks).

Occasionally I encounter someone who thinks that because we were created via evolution, and evolution is a process that optimizes the number of descendants that a reproductively-successful species has, we as individuals should care about having lots of descendants. But this is a logical mistake: just because the process that created you was optimized to create X, that doesn’t mean that you yourself must value creating X. Just because evolution was optimized for spreading your genes doesn’t mean you should have that as your goal.

2. We can develop non-intrinsic values out of our intrinsic values

A robot can develop values that are not intrinsic values. For instance, maybe the robot learns a rule that it should avoid taking certain types of actions (because they, on average, lead to negative value according to its objective function). This rule works well, which is why it learns it. Making an analogy to the way humans work, we can now say that the robot “values” avoiding these behaviors, but avoiding them is not an “intrinsic” value – they are just a means to an end to attain its deeper values. Like humans, the robot could end up in a situation where its instrumental values and intrinsic values diverge.

The environment could abruptly change such that, in the long term, the robot would produce a higher value of its objective function if it no longer avoided those (previously punished) behaviors – but because it follows the rule of avoiding them, it never learns that. We see this sort of behavior in people in many ways. One example is that victims of abuse sometimes adopt self-protective behavior that helped them survive in those abusive relationships, but that cause problems in future relationships, making it harder to become close to the new (much kinder) people in their life.

3. Robots might maximize expected value, but humans don’t

When designing such a robot, a natural choice for its decision rule would be to have it try to maximize the expected value of its objective function. That is, for each action, it would effectively be evaluating, “On average, how good will the world be according to my objective function if I take this action compared to if I take the other actions available?”

Humans don’t do this; we deviate from what an expected-value-maximizing agent would do (as has been well documented in the behavioral economics literature and cognitive biases literature). Interestingly, the Von Neumann–Morgenstern utility theorem shows that any agent that makes choices so as to maximize the expected value of ANY utility function will satisfy four basic axioms in its behavior. Since real human behavior doesn’t satisfy these axioms, this provides evidence that our behavior is not well modeled by trying to maximize the expected value of some utility function.

So what do humans actually do, instead? It seems we have a variety of forces that influence our behavior, including:

  • Basic urges (such as the urge to go to the bathroom)
  • Built-in heuristics (such as conserving energy unless there is a reason not to)
  • Habits (if every time recently when we’ve been in situation X we’ve done Y, we’ll likely do Y the next time we’re in situation X)
  • Mimicry (if everyone else is doing something or expects us to do it, we’ll probably do it too)
  • Intrinsic values (the things we fundamentally care about as ends in and of themselves)
  • Plus others as well.

Our behavior arises from a variety of interlocking algorithms (running in our brains and bodies), and these algorithms aim at different things (conserving energy, gathering energy, and so on). A human is not a system with a unified objective.

This piece was drafted on February 5, 2023, and first appeared on this site on May 7, 2023.

You’ve just finished the fourth post in my sequence of essays on my life philosophy, Valuism. Here are the first, second, third, and fifth parts in the sequence.


  

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *