Image by Leon-Pascal Jc on Unsplash

A Paradoxical Puzzle For Ethical Utilitarians

A utilitarian blinks. When his eyes open a split second later, he’s astounded to discover that he’s in an entirely different place. Omega stands in front of him.

“I’ve brought you here to play a game,” Omega says. “The well-being of humanity depends on your choices, so pay very close attention. You start with 100 credits.”

A screen suddenly appears with the number 100 on it. “To play the game,” Omega continues, “just think of some number of credits that you’d like to bet during the next round, and then push the blue button. You can’t bet more credits than you have remaining. Whatever number of credits you are thinking of when you push the button will be your bet for that round.”

A blue button appears.

“With each push of the blue button, there is a 99% chance that you win, in which case the credits you bet that round will increase by 10%. But there is a 1% chance that you’ll lose, in which case you forfeit all the credits you bet.”

“Let me give you an example,” Omega continues. “Suppose that you have 100 credits and bet 10 of them. If you win, which, remember, has a 99% chance of happening, your 10 credits will grow to 11 credits, so you’ll have 101 credits after that round. But if you lose, which, remember, has a 1% chance of happening, you will lose those 10 credits you bet, and so will end the round with only 90 credits. Is that clear?”

The (“maximizing”, “act”) utilitarian nods.

“Good. It’s also important to know that winning or losing in any particular round won’t affect the chance of winning or losing in any other rounds. And you can play as many rounds of the game as you like. Reality is paused while you are in this room, so if you want, you can play for a hundred years or a million years or however many rounds you choose. When you decide to stop playing, just leave the room through that door to return to your normal life. The moment you leave, I will adjust the world by adding an amount of positive utility proportional to the number of credits you have remaining. The amount of utility I will add to the world for each credit you end up with is equivalent to the utility of 1000 of the happiest lives humans have ever lived. That means there is a lot of utility on the line here, so don’t F this up.”

Omega disappears. The utilitarian thinks about the game for a while and performs some calculations.

“If I bet C credits on my first play, then the average number of credits I will get back from my first play is:

Average I get back after one play = 0.99 [chance of winning] * 1.10 [reward if I win] * C [amount I bet]+ 0.01 [chance of losing] * 0 [penalty if I lose] * C [ amount I bet] = 1.089 C ≈ C + 0.09 C

So that means that each time I play, the number of credits I get back is whatever I bet, increased by about 9% on average. So I maximize the average number of credits I end up with by betting all of them in my first play! If I bet 100 credits, then on average I’ll end up with 108.9, whereas if I bet 99 credits, then on average I’ll only end up with 108.811, so it maximizes expected value to bet them all.

However, nothing changes from one round to the next, since each round is statistically independent from the last. That means that every time I play, I should bet ALL my credits. And if I do that, my credits will grow exponentially, by about 9% with each play. That’s amazing!

Wait a moment…that doesn’t feel right. Isn’t it really risky to bet all my credits each round? I risk losing all of them. But that strategy does seem to maximize the expected total utility, so it is the most ethical thing to do.

Let me double-check my reasoning. Suppose that I bet a fraction F of my credits in one round, and I start that round with C credits. That means at the end of that round, on average, I have:

average after one round = 0.99 * 1.10 * F * C + 0.01 * 0 * F * C + (1-F) * C
= (0.99 * 1.10 * F + (1-F) ) C = C + 0.089 F C

This is clearly maximized when F is as large as possible, which is F=1, so I clearly should bet all my credits on the first round to maximize the average utility! That again seems to imply I should always bet all my credits on every round, since each round is identical and independent from the others, so the decision should always be the same as in the first round.

Hmm, still seems odd. To be sure, maybe I need to think about what happens if I play k rounds in a row, rather than just one.

Let’s suppose I play for two rounds in a row, and bet a fraction F1 on the first round, and a fraction F2 on the second round. Let M1 be the multiplier I happen to get on the credits the first round (1.10 with 0.99 probability or 0 with 0.01 probability), and M2 the multiplier on the second round (which is identical to but independent from M1). Then, after two rounds where I bet first F1 then F2, my average payout is:

average after two rounds = average[ M2 * F2 * (M1 * F1 * C + (1-F1) * C) + (1-F2) * (M1 * F1 * C + (1-F1) * C)] = ( 1 + 0.089 F1 + 0.089 F2 + 0.089^2 F1 F2) C

Which, since all the coefficients are positive, will always increase as F1 increases and as F2 increases. So to maximize the expected utility, we want to use the maximum values for F1 and F2, which are just F1=1 and F2=1, implying again that we should bet all our credits on each round even if we’re playing for two rounds!

If I bet all the credits I have at each round, then the expected value after playing k rounds will be my original starting credits multiplied by the product of the expected values of M1, M2, etc., since each is independent from the others (and the expected value of a product of independent variables is the product of the expected values). Therefore:

average after k plays = 100 * 1.089^k

Wow, so the expected utility really will grow exponentially, and the more times I play, the greater that utility will be, on average! And even if there is some other way to produce an equal amount of utility in this game, through some other betting strategy, this approach will clearly do it in fewer plays than any other – and since I myself get slight disutility being stuck here, forced to play this game, all utility (including my own as a tiny part of that) is maximized playing in the more efficient way.”

He triple and then quadruple checks his math. Satisfied that he’s really found the utilitarian solution, he starts to play the game, betting all of his credits in each round.

After 50 rounds, he’s up to 11,739 credits. After 100 rounds, he’s at 1,378,061 credits, equivalent to more than a billion happy lives. He’s extremely tempted to leave at that point with his utility winnings, but he rechecks his calculation again, and it tells him that the average utility maximizing action (i.e., the most ethical one according to his moral philosophy) is to keep playing. At round 118, he loses, and his credits are all wiped out. With no more credits to bet, he has no choice but to leave. He returns through the door to his normal life, and the world is no better off than when the game began.

His strategy of maximizing total average utility appears to be required of him as the most ethical action according to his utilitarian ethical beliefs, yet it seems to be the worst strategy possible in Omega’s game, guaranteeing that the lowest possible amount of utility will be created with 100% certainty.

Interestingly, I think the same scenario still works (i.e., presents a problem and seeming paradox for the utilitarian) even if Omega says that the game also has a chance of ending on its own (if you decide never to end it on purpose), with the probability of the game ending after n rounds are completed of 1/(n*(n+1)). So if you never end the game on purpose, the game has 1/2 probability of automatically ending after the first round, a 1/6 probability of automatically ending after 2 rounds, a 1/12 probability of automatically ending after 3 rounds, and so on. Other than that change, though, the rules are exactly the same (so you can still choose to end the game any time you want, it’s just that now the game might also end after one of the rounds automatically, even if you want to keep playing, and the game ending automatically has the same result as if you ended it on purpose). Note that this specific probability distribution (of the game ending automatically) is special because it implies that the expected value for the number of rounds played is still infinite, even though the game might automatically end after any round.


This piece was first written on April 7, 2018, and first appeared on my website on September 11, 2025.



Comments

Leave a Reply to bean Cancel reply

Your email address will not be published. Required fields are marked *


  1. The nature of the game works similar to tracing y=1/x from the right to left x axis. The moment you want to maximalize with x=0, you get nothing. If you had just taken a step back, and let x be an infinitesimal, you’d actually maximalize. But the math does make guide you to x=0. Pure math simply does not yield the best result, which is indeed an interesting paradox.

    Given the rules, and the fact that the human population is finite but the amount of game repetition is infinite, you actually have a solution, just not in the nature of pure maximalization. Amazing essay, got me thinking a lot.

  2. There isn’t a paradox here really, and our utilitarian is quite foolish. When they choose a wager and gets more credits, they do not actually increase the utility of anyone. They only increase utility by the action of *exiting the room*. So they can’t consider each move in isolation, but need to think about the longer term, in which case it is obvious that you shouldn’t bet everything until you lose. The ability to make further wagers is incredibly (even approaching infinitely) valuable, and giving that up when there is no limit on the game is foolish. (Conversely, if the game was limited to five tries, sure, wager everything every time.)

    There’s an another way of thinking about it that also shows this, which is looking to the logarithmic value of the utility, that is, how many times you double it (or halve it, in the negative case) from the baseline. Again, we note that we can play forever. What’s the optimal strategy from the POV of getting to some desired level in less time (so you don’t get bored)? If you math it out, it’s around risking 99% of the tokens every round, which needs 24.4 rounds of play to expect to double the current tokens.

    At that point, one can contemplate possibly more interesting questions, like ‘if there is more utility than happiest-ever-for-everyone-alive-now, does it engage something like Homestuck Trickster Mode for everyone, or is it applied to future generations?’ and ‘if it does save for later generations, how does our utilitarian tear themselves away from “I’ve made everything glorious and beautiful for everyone for a million years, but with just an hour more, it could be two million years, would it be wrong to stop?” and the like’.

    1. Agreed, it does in real life – but by definition, utilitarians don’t care about risk (or risk of ruin), they just care about maximizing expected value! So I don’t think the Kelly criterion solves this for the puzzle itself.

  3. [Edit: I added a line between paragraphs]
    I can’t directly point out the mistake in the mathematical reasoning but we all know that it led to a guaranteed ruin. This was a predictable consequence, so the expected utility calculation must have been wrong and naive. As I understand it, if the person bet only a fraction of their credits, say 10%, they’d be essentially guaranteed to reach infinite utility in infinite number of rounds. This solution with fraction bets seems to maximize utility, not the original one, which took into account only the expected utility after a finite number of rounds.

    I think the experiment could be strengthened. Let’s consider a player who must choose to bet all credits or withdraw from the lottery and retain the credits, while keeping the probabilities (99% and 1%) and changes in credits (times 1.1 and lose everything) the same. A utilitarian seems to be required by the expected value theory to always bet because the potential gain is larger than the potential loss. By the loss I mean the credits that the player loses as a result of the bet. I believe this loss is real but the article deals only with the average number of credits returned to the player, not the average amount they will have available after the bet which seems more relevant to me. As in the previous example, however, the utilitarian is invariably destined to lose the bet sooner or later.

    What I think solves the problem is either something akin to diminishing marginal returns from utility or minimalist theories that are primarily focused on reducing suffering, not increasing pleasure. The decision-maker from the latter group will simply bet as long as there’re enough sentient beings who can be alleviated from suffering. Once the limit is reached, it’s not possible to increase credits (that is, utility) by 10%. At the stage when the utility can be increased utmost by around 2%, the player is indifferent between betting or leaving because they will not, on average, increase the number of utility (eg. 1000*1.02*0.99-1000*0.01 ≈ 1000).

    As far as I can tell, this solution to this problem should not create paradoxes of this nature, but I might have made mistakes in the math calculations, so please do point them out to me. Thank you.

  4. There are more strategies that your character didn’t consider. For example, they could have compared:
    (A) bet 100% of available credits once
    (B) bet 10% of available credits, twenty times
    and noticed that option (B) had a higher expected utility than (A).

    Of course there are better strategies than (B) as well.

    What’s happening here is that there are an infinite number of strategies, and many of them have an infinite expected payout. In your story, you’re effectively comparing:
    (C) bet 100% of available credits, an infinite number of times
    (D) bet 10% of available credits, an infinite number of times
    and your claim is that a utilitarian “has to choose (C) because it has higher expected utility”.
    But both (C) and (D) have infinite expected utility! Neither of them is higher! They’re both infinity!

    Also, (C) and (D) are both much worse than the strategy a utilitarian would *actually* use, which would be:
    (E) choose a target number T, like 10^30. Bet 10% of available credits, and repeat until you have 0 credits or T credits, and then stop.

    Once a target number T is chosen, now all the numbers are finite, and it becomes possible to compute expected values without getting confused. At that point we can notice that we should be doing Kelly betting.

  5. I think a simpler example you could give would be:

    “A utilitarian is offered a coinflip with an 0.1% chance of multiplying utility by a million, and a 99.9% chance of destroying everything.”

    I think that example captures the message you wanted to highlight with this post, that most people wouldn’t approve of an approach that ignores risk in favor of maximizing expected value.

  6. That’s a strawman utilitarian. As there’s no limit to the number of rounds, the number of rounds (payout per round) shouldn’t factor into the calculation. The only relevant part is avoiding wipeout, as the payout is arbitrarily high if that’s achieved.

    Note: The game is underspecified; it doesn’t mention if fractional credits exist.
    – If fractional credits exist, the player can’t get wiped out if not betting 100% of their credits, so they bet 90% (or 50% or 99%) at each step and then play “infinite” rounds.
    – If payout is rounded (down), then going below 5 (10) credits makes it impossible to make a profit. To reduce the chance of that happening as much as possible, the player should bet 10 in every step, making wipeout extremely unlikely (and in fact it gets unlikelier the bigger the stash grows).

    To me it’s not even relevant if the player is utilitarian. Any reasonable moral system would use the same strategy.