Technical Papers May 2026

Designing Boss Behavior Using Markov Chains

This paper introduces the concept of Markov Chains and their usefulness in planning boss fights.

Introduction

One of my favorite things about boss fights is feeling like they're thinking, processing your actions and deciding how to respond. Not literally, of course, but good bosses create the illusion of decision-making. Video games rely heavily on systems that determine how these characters behave over time.

What is a Finite State Machine?

One of the most common ways developers organize these behaviors is through a Finite State Machine (FSM). An FSM is a graph that breaks behavior down into a series of states and transitions. Think of it as a list of actions a character can take, and some clear rules around when they can take those actions.

Finite State Machine

For example, this Finite State Machine shows four states. Idle, Attack, Run and Jump. This machine states that the player can attack while running or idle, but not while jumping. They cannot run while jumping, and they cannot jump in the middle of an attack. Otherwise, anything goes. The system above is controlled by a player's inputs. In games, FSMs are used to model enemy behavior, player movement, NPC interactions, and yes, even boss fights. FSMs are remarkably close to a concept known as Markov Chains.

What is a Markov Chain?

A Markov Chain is basically the same thing as an FSM, but instead of getting inputs from the player, there's a numerical chance to go from one state to another.

Markov Chain

The transition probabilities can be any value from 0 to 1, but the transitions from any state to all other states must total 1. For example, after jumping, you can only idle. So there's a 100% chance you'll go idle after jumping. This is the simplest form of a Markov Chain.

Designing a Boss Fight

In this paper, we're going to model a Markov Chain Finite State Machine for a boss encounter to see if we can make the boss feel like it's making decisions. During combat, bosses can have a few different states such as attacking, defending, moving, charging an ability, recovering from exhaustion, or using special attacks. Thought these behaviors may appear random to the player, they're usually dependent on what the boss is currently doing. For example, a boss that just used a special attack might be more likely to enter exhaustion, while one that is moving might be more likely to attack or defend.

At each step, the boss transitions from one state to another according to its current state and a set of transition probabilities. These probabilities can be adjusted to create different combat patterns and difficulties. These probabilities allow us to make the boss' next decision feel unpredictable while following a structured behavioral system. These systems can become more complex when used in groups, such as Parallel State Machines (PSM), which are just two FSMs tied together. These machines can each manage one aspect of behavior, for example, one can handle attacks while the other handles movement, so long as these behaviors are not mutually exclusive. For our purposes, we'll stick to just a Finite State Machine and assess how a boss behaves throughout combat.

Motivation

I've been building video games on my free time for two years, and enjoying every minute. One project I'm starting soon incorporates boss fights in the form of a rhythm game, so it's something I, and we here at Moonsap, want to get right. I've always seen these behaviors as deterministic, meaning every time a specific situation arises, we have the same outcome. Driving this behavior through a Markov Chain should give some more life to the boss and make it feel like it's making decisions, even if those decisions are just probabilistic.

State Definitions

First, we need define our states and time steps for the Markov Chain Finite State Machine. Our time steps are simple. Since the action times for any state can vary, we'll just represent each time step as one boss decision. At the end of each action, the boss evaluates its current state, be it attacking, defending, etc. and transitions into another state according to our probability matrix, which we'll define later.

  1. Moving (M)
  2. The boss repositions itself around the environment.
  3. Attacking (A)
  4. The boss performs a standard melee or ranged attack that is commonly used throughout the fight.
  5. Defending (D)
  6. The boss blocks, dodges, or shields itself.
  7. Charging an Ability (C)
  8. The boss charges a special ability and telegraphs a powerful attack.
  9. Special Attack (S)
  10. The boss executes a unique ability. Less frequent than normal attacks, but far more dangerous.
  11. Exhausted (E)
  12. After using powerful abilities or remaining aggressive for a long time, the boss becomes temporarily vulnerable.

Transition Probability Matrix (TPM)

Moving M Attacking A Defending D Charging C Special S Exhausted E
Moving M 0.10 0.45 0.20 0.25 0.0 0.0
Attacking A 0.30 0.35 0.15 0.10 0.0 0.10
Defending D 0.20 0.35 0.20 0.15 0.0 0.10
Charging C 0.0 0.0 0.05 0.05 0.85 0.05
Special S 0.15 0.05 0.05 0.0 0.0 0.75
Exhausted E 0.45 0.20 0.20 0.05 0.0 0.10

The transition probabilities were chosen to imitate believable boss behavior during combat. Moving usually transitions into attack or defense, attacking transitions into follow-up attacks or movement, defense transitions into attack or movement, and charging almost always transitions into a special attack. Special attacks almost always leave the boss exhausted, giving the player a chance to counter. Exhaustion has a small chance of extending, or leads back into movement, attack, or defense. Our only real limitation with this boss is that special attacks cannot be cast without charging first.

The Advantage

The system above allows us to simply say "state x has a chance to go to state y." That's already a strong way to make the boss feel like it's making decisions, even if it's just complete chance. But that isn't the strength of a Markov Chain. The advantage of using Markov Chains lies in the analysis it allows us to do on boss behavior.

Predicting the future

Using the table above, we know what the odds are of going from any one state to any other. And given some starting state, we know what the odds are of our next decision going to a given state. For example, if we start by Moving, the M row gives us the transition values:
M = [0.10, 0.45, 0.20, 0.25, 0.0, 0.0]
Our probability of moving again is 0.10, or 10%. Our probability of attacking is 0.45, or 45%, and so on.

But what if we wanted to tell what the boss would do 10 moves from now? We can take the matrix above, our Transition Probability Matrix, and multiply the matrix by itself. Matrix multiplication isn't as daunting as it sounds. Let's pretend our boss begins any fight by moving. For this, we'll again just look at the first of our TPM,
M = [0.10, 0.45, 0.20, 0.25, 0.0, 0.0]

We take this row, and we multiply each value by the next value in our column, from left to right. We begin with the row:
M = [0.10, 0.45, 0.20, 0.25, 0.0, 0.0]
and the column:
M = [0.10, 0.30, 0.20, 0.0, 0.15, 0.45]

We'll multiply each value with its corresponding position, then add up the results.
(0.10 * 0.10) + (0.45 * 0.30) + (0.20 * 0.20) + (0.25 * 0.0) + (0.0 * 0.15) + (0.0 * 0.45) = 0.01 + 0.135 + 0.04.
0.01 + 0.135 + 0.04 = 0.185.

This is our first value. We do the same steps with the next column, and the next. By the end, we should end up with a new set of probabilities:
M = [0.185, 0.2725, 0.14, 0.1125, 0.2125, 0.0775]

These are our chances of transitioning from our moving state into any other state on our second decision, P2. And if we wanted to figure out the chance of ending up in a state on our third decision, we would just repeat the process using P2 and our original TPM. If we extend this to our 10th decision, we get:
M = [0.213, 0.280, 0.154, 0.117, 0.099, 0.137]

This P10 set tells us a few things. First off, it tells us the odds of ending up in each state given that we begin by moving, and the number of decisions we've made. In this case, the odds of moving on our 10th decision is 0.213, 21%. The odds of attacking are 0.280, 28%. But, considering the TPM is aperiodic and irreducible, just meaning it's not forced to repeat, and we won't get stuck in a state, this set above also serves as the breakdown of how we spend our combat.

In this case, it's telling us that the boss will spend 21.3% of its decisions moving, 28% of its decisions attacking, 15.4% of its decisions defending, 11.7% of its decisions charging, 9.9% of its decisions special attacking, and 13.7% of its decisions exhausted.

We could take this to the extreme of 100 decisions to really hammer the point home, but you can just estimate based on how many decisions you think your boss will make in one combat. Ultimately, this system lets us predict what states our boss will spend most of its time in, allowing us to tweak combat long-term combat behavior with just a few numbers. Our current analysis showed that our boss, over 10 decisions, spends most of its time attacking and repositioning, while stronger abilities don't happen as frequently and are balanced out by telegraphing and exhaustion.

Overall, this should highlight how stochastic systems can help developers design enemy behavior that feel alive and unpredictable to players, simulating simple decision-making. These decisions can be enhanced dramatically by adjusting the transition probabilities based on things like the player's action or the current combat phase. It should be balanced out by hard rules such as "if the boss lost a lot of health, cancel what he's doing and defend immediately." This leads to a still probabilistic, but reactive behavioral model that makes the boss feel like it's thinking on its feet.

Hopefully this analysis highlighted just how Markov Chains can be used effectively to model boss behavior, while covering its major advantage, the analysis. The downside to this approach is that it's not very scalable. While it does work at scale, as you add new states, you'll need to redistribute the probabilities so that every state transition set sums to 1. Given this, I recommend taking ample time to plan what kind actions you want your boss to take, which probabilities need to be dynamic, and how those will be balanced out.

I know this approach has opened my eyes when it comes to designing these systems, and I can't wait to share that with everyone when we go to design our rhythm game. More on that soon.

Much love, Vin