Ai engine game programming
A finite state machine or FSM for short is a fancy way of saying that some object — for example, one of our AI agents — is currently in one of several possible states, and that they can move from one state to another. There are a finite number of those states, hence the name. A real-life example is a set of traffic lights, which will go from red, to yellow, to green, and back again.
This applies quite well to NPCs in games. A guard might have the following distinct states:. But imagine we want to add a few more states:. To do this, we consider all the states, and under each state, we list all the transitions to other states, along with the conditions necessary for them. We also need to designate an initial state so that we know what to start off with, before any other conditions may apply. This is known as a state transition table and is a comprehensive if unattractive way of representing the FSM.
This captures the essence of the decision making for that agent based on the situation it finds itself in, with each arrow showing a transition between states, if the condition alongside the arrow is true. The Idling state will, every frame or tick, check whether the 10 second timer has expired, and if it has, trigger the transition to the Patrolling state.
That handles the transitions between states — but what about the behaviours associated with the states themselves? For an example of the first type, the Patrolling state will, every frame or tick, continue to move the agent along the patrol route.
The Attacking state will, every frame or tick, attempt to launch an attack or move into a position where that is possible.
And so on. The agent must pick where to go to find help, and store that information so that the Finding Help state knows where to go. The senses are embodied in the data used by the transition logic. The thinking is embodied by the transitions available in each state. And the acting is carried out by the actions taken periodically within a state or on the transitions between states.
This basic system works well, although some times the continual polling for the transition conditions can be expensive. For example, if every agent has to do complex calculations every frame to determine whether it can see any enemies in order to decide whether to transition from Patrolling to Attacking, that might waste a lot of CPU time.
The end behaviour is identical, except for the almost unnoticeable and yet more realistic delay in responding, but the performance is better, as a result of moving the Sense part of the system to a separate part of the program. This is all well and good, but large state machines can get awkward to work with. If we wanted to broaden the Attacking state by replacing it with separate MeleeAttacking and RangedAttacking states, we would have to alter the in-bound transitions in every state, now and in future, that needs to be able to transition to an Attacking state.
You probably also noticed that there are a lot of duplicated transitions in our example. Most of the transitions in the Idling state are identical to the ones in the Patrolling state, and it would be good to not have to duplicate that work, especially if we add more states similar to that.
Example, using a separate transition table for the new Non-Combat sub-state:. This is essentially the same system, except now there is a Non-Combat state which replaces Patrolling and Idling, and it is a state machine in itself, with 2 sub-states of Patrolling and Idling.
With each state potentially containing a state machine of sub-states and those sub-states perhaps containing their own state machine, for as far down as you need to go , we have a Hierarchical Finite State Machine HFSM for short. By grouping the non-combat behaviours we cut out a bunch of redundant transitions, and we could do the same for any new states we chose to add that might share transitions.
For example, if we expanded the Attacking state into MeleeAttacking and MissileAttacking states in future, they could be substates, transitioning between each other based on distance to enemy and ammunition availability, but sharing the exit transitions based on health levels and so on.
Complex behaviours and sub-behaviours can be easily represented this way with a minimum of duplicated transitions. With HFSMs we get the ability to build relatively complex behaviour sets in a relatively intuitive manner. However, one slight wrinkle in the design is that the decision making, in the form of the transition rules, is tightly-bound to the current state. In many games, that is exactly what you want. And careful use of a hierarchy of states can reduce the amount of transition duplication here.
This is where a behaviour tree comes in. However, there are a few key differences:. This small set of nodes can be combined to produce a large number of complex behaviours, and is often a very concise representation. The new tree looks like this:. Behaviour trees are quite complex as there are often many different ways to draw up the tree and finding the right combination of decorator and composite nodes can be tricky.
There are also the issues of how often to check the tree do we want traverse it every frame, or only when something happens that mean one of the conditions has changed? How do we know which nodes were executing last time, so we can handle a sequence correctly?
For this reason there are many different implementations. Behaviour trees are powerful tools but learning to use them effectively, especially in the face of multiple varying implementations, can be daunting. A utility-based system is exactly that — a system where an agent has a variety of actions at their disposal, and they choose one to execute based on the relative utility of each action, where utility is an arbitrary measure of how important or desirable performing that action is to the agent.
By writing utility functions to calculate the utility for an action based on the current state of the agent and its environment, the agent is able to check those utility values and thereby select the most relevant state at any time. Again, this closely resembles a finite state machine except one where the transitions are determined by the score for each potential state, including the current one.
Note that we generally choose the highest-scoring action to transition to or remain in, if already performing that action , but for more variety it could be a weighted random selection favouring the highest scoring action but allowing some others to be chosen , picking a random action from the top 5 or any other quantity , etc.
The typical utility system will assign some arbitrary range of utility values — say, 0 completely undesirable to completely desirable and each action might have a set of considerations that influence how that value is calculated.
Returning to our guard example, we might imagine something like this:. If currently idling and have done it for 10 seconds already, return 0, else return One of the most important things to notice about this setup is that transitions between actions are completely implicit — any state can legally follow any other state.
Also, action priorities are implicit in the utility values returned. Similarly, the non-combat actions never return more than 50, so they will always be trumped by a combat action. Actions and their utility calculiations need designing with this in mind. Our example has actions return either a fixed constant utility value, or one of two fixed utility values. A more realistic system will typically involve returning a score from a continuous range of values.
This allows relative action priorities to change based on any number of criteria, which can make this type of approach more flexible than a behavior tree or finite state machine. Each action usually has a bunch of conditions involved in calculating the utility. To avoid hard-coding everything, this may need to be written in a scripting language, or as a series of mathematical formulae aggregated together in an understandable way.
For example, if a character has a Hunger motivation, this might periodically go up over time, and the utility score for the EatFood action will return higher and higher values as time goes on, until the character is able to execute that action, decrease their Hunger level, and the EatFood action then goes back to returning a zero or near-zero utility value.
The idea of choosing actions based on a scoring system is quite straightforward, so it is obviously possible — and common — to use utility-based decision making as a part of other AI decision-making processes, rather than as a full replacement for them. A decision tree could query the utility score of its two child nodes and pick the higher-scoring one. Similarly a behaviour tree could have a Utility composite node that uses utility scores to decide which child to execute.
In our previous examples, we either had a simple paddle which we told to move left or right, or a guard character who was told to patrol or attack. But how exactly do we handle movement of an agent over a period of time? How do we set the speed, how do we avoid obstacles, and how do we plan a route when getting to the destination is more complex than moving directly?
At a very basic level, it often makes sense to treat each agent as having a velocity value, which encompasses both how fast it is moving and in which direction. This velocity might be measured in metres per second, miles per hour, pixels per second, etc. If we know where an agent wishes to be, then we will want to use our velocity to move the agent in that direction.
Very trivially, we have an equation like this:. So, imagining a 2D world where the agent is at -2,-2 and the destination is somewhere roughly to the north-east at 30, 20 the desired travel for the agent to get there is 32, With that set, and movement being based on that value, the agent would arrive at the destination a little under 8 seconds later, as we would expect.
The calculations can be re-run whenever you want. This also works for moving targets within reason , allowing the agent to make small corrections as they move. Often we want a bit more control than this — for example, we might want to ramp up the velocity slowly at the start to represent a person moving from a stand still, through to walking, and then to running.
We might want to do the same at the other end to have them slow down to a stop as they approach the destination. Each behaviour has a slightly different purpose. Seek and Arrive are ways of moving an agent towards a destination point. Obstacle Avoidance and Separation help an agent take small corrective movements to steer around small obstacles between the agent and its destination.
Alignment and Cohesion keep agents moving together to simulate flocking animals. Any number of these different steering behaviours can be added together, often as a weighted sum, to produce an aggregate value that takes these different concerns into account and produces a single output vector.
For example, you might see a typical character agent use an Arrival steering behaviour alongside a Separation behaviour and an Obstacle Avoidance behaviour to keep away from walls and other agents. Therefore it sometimes makes sense to consider variations on steering behaviours which are more sophisticated than just adding together all the values. One family of approaches is to think of steering the other way around — instead of having each of the behaviours give us a direction and then combine them to reach a consensus which may itself not be adequate , we could instead consider steering in several different directions - such as 8 compass points, or 5 or 6 directions in front of the agent - and see which one of those is best.
But what about when the route to the destination is more complex? Otherwise, repeat the process with the reachable neighbors of the previous neighbors, until you find the destination or run out of squares which means there is no possible route.
The search space is like a wavefront that moves out until it hits the place that is being searched for. This is a simple example of the search in action. The search area expands at each step until it has included the destination point — then the path back to the start can be traced.
The result here is that you get a list of gridsquares which make up the route you need to take. The simplest approach is to steer towards the centre of the next gridsquare, but a popular alternative is to steer for the middle of the edge between the current square and the next one. This allows the agent to cut corners on sharp turns which can make for more realistic-looking movement. It works much the same way as breadth-first search, except that instead of blindly exploring neighbors, then neighbors of neighbors, then neighbors of neighbors of neighbors and so on, it puts all these nodes into a list and sorts them so that the next node it explores is always the one it thinks is most likely to lead to the shortest route.
The nodes are sorted based on a heuristic — basically an educated guess — that takes into account 2 things — the cost of the hypothetical route to that gridsquare thus incorporating any movement costs you need and an estimate of how far that gridsquare is from the destination thus biasing the search in the right direction.
In this example we show it examining one square at a time, each time picking a neighbouring square that is the best or joint best prospect. The resulting path is the same as with breadth-first search, but fewer squares were examined in the process — and this makes a big difference to the game's performance on complex levels.
The previous examples used a grid overlaid on the world and plotted a route across the world in terms of the gridsquares. But most games are not laid out on a grid, and overlaying a grid might therefore not make for realistic movement patterns. So, what are the alternatives? So we could decide to place the nodes at arbitrary positions in the world, and providing there is a walkable straight line between any two connected nodes, and between the start and end positions and at least one of the nodes, our algorithm will work just as well as before — usually better, in fact, because we have fewer nodes to search.
Example 1: a node in every grid square. The search starts from the node the agent is in, and ends at the node in the target gridsquare. Example 2: a much smaller set of nodes, or waypoints. The search begins at the agent, passes through as many waypoints as necessary, and then proceeds to the end point. Note that moving to the first waypoint, south-west of the player, is an inefficient route, so some degree of post-processing of a path generated in this way is usually necessary for example, to spot that the path can go directly to the waypoint to the north-east.
This is quite a flexible and powerful system. But it usually requires some care in deciding where and how to place the waypoints, otherwise agents might not be able to see their nearest waypoint and start a path. It would be great if we could generate the waypoints automatically based on the world geometry somehow. Each of the triangles in the mesh becomes a node in the graph and has up to 3 adjacent triangles which become neighboring nodes in the graph.
This picture is an example from the Unity engine — it has analysed the geometry in the world and produced a navmesh light blue that is an approximation of it. Each polygon in the navmesh is an area that an agent can stand on, and the agent is able to move from one polygon to any polygon adjacent to it. Pathfinding is a wide subject and there are many different approaches to it, especially if you have to code the low level details yourself. We can generalise this idea to a wide range of concepts where achieving your goal is not just about the next step, but about a series of steps that are necessary to get there, and where you might need to look ahead several steps in order to know what the first one should be.
This is what we call planning. Pathfinding can be thought of as one specific application of planning, but there are many more applications for the concept. In this situation a human player would probably know to play the Forest, tap it for 1 point of Green Mana, and then summon the Elvish Mystic. But how would a game AI know to make that decision? The naive approach might be to just keep trying each action in turn until no appropriate ones are left.
Looking at the hand of cards, it might see that it can play the Swamp, so it does so. After that, does it have any other actions left this turn? So, the AI player has produced a valid turn, but not a very optimal one. Luckily, we can do better. In much the same way that pathfinding finds a list of positions to move through the world to reach a desired position, our planner can find a list of actions that get the game into a desired state.
We can search through these actions and successor actions until we reach the state that we want. The start of the turn sees us with only 2 potential actions allowed by the rules of the game:. Each action taken may enable further actions and close off others, again depending on the game rules.
Imagine we choose to play the swamp — this removes that action as a potential successor as the swamp has already been played , it removes Play The Forest as a successor because the game rules only allow you to play one Land card per turn , and it adds Tap the Swamp for 1 point of Black Mana as a successor — the only successor, in fact.
So, we repeat the process for the next action. That gives us 1 green mana, which in this case opens up a third step, that of Summon Elvish Mystic. Typically you might score potential plans based on the final outcome or the cumulative benefit of following the plan. For example, you might award yourself 1 point for playing a Land card and 3 points for summoning a creature.
This would be the top scoring plan on offer and therefore would be chosen, if that was how we were scoring them. Sometimes there are just too many possible actions at each step for it to be reasonable for us to consider every permutation. Returning to the Magic: The Gathering example - imagine if we had several creatures in our hand, plenty of land already in play so we could summon any of them, creatures already in play with some abilities available, and a couple more land cards in our hand — the number of permutations of playing lands, tapping lands, summoning creatures, and using creature abilities, could number in the thousands or even tens of thousands.
Luckily we have a couple of ways to attempt to manage this. Instead of trying all the actions and seeing where they lead, we might start with each of the final results that we want and see if we can find a direct route there. An analogy is trying to reach a specific leaf from the trunk of a tree — it makes more sense to start from that leaf and work backwards, tracing the route easily to the trunk which we could then follow in reverse , rather than starting from the trunk and trying to guess which branch to take at each step.
By starting at the end and going in reverse, forming the plan can be a lot quicker and easier. To achieve this, our system knows it needs to cast a direct damage spell, which in turn means that it needs to have one in the hand and have enough mana to cast it, which in turn means it must be able to tap sufficient land to get that mana, which might require it to play an additional land card. This often allows us to form a plan that is optimal, or at least good enough, without needing to consider every possible permutation of plans.
An interesting and increasingly popular variant on best-first search is Monte Carlo Tree Search. Instead of attempting to guess which plans are better than others while selecting each successive action, it picks random successors at each step until it reaches the end when no more actions are available — perhaps because the hypothetical plan led to a win or lose condition — and uses that outcome to weight the previous choices higher or lower.
By repeating this process many times in succession it can produce a good estimate of which next step is best, even if the situation changes such as the opponent taking preventative action to thwart us. This is a widely-used and widely-discussed technique, but apart from a few specific implementation details it is essentially a backwards-chaining planner that starts with a goal and attempts to pick an action that will meet that goal, or, more likely, a list of actions that will lead to the goal being met.
For example, if the goal was e. We might want a computer opponent in a shooter game to learn the best places to go in order to score the most kills. So there are times when some degree of machine learning can be useful. For example, say we had a real-time strategy game and we were trying to guess whether a player is likely to launch a rush attack in the first few minutes or not, so we can decide whether we need to build more defences or not.
To begin with we have no data about the player from which to extrapolate — but each time the AI plays against the human opponent, it can record the time of the first attack.
After several plays, those times can be averaged and will be a reasonably good approximation of when that player might attack in future. The problem with simple averages is that they tend towards the centre over time, so if a player employed a rushing strategy the first 20 times and switched to a much slower strategy the next 20 times, the average would be somewhere in the middle, telling us nothing useful. One way to rectify this is a simple windowed average, such as only considering the last 20 data points.
A similar approach can be used when estimating the probability of certain actions happening, by assuming that past player preferences will carry forward to the future. Our AI characters would be well advised to find some fire-retardant armor!
Another interesting method is to use a Naive Bayes Classifier to examine large amounts of input data and try to classify the current situation so the AI agent can react appropriately. Bayesian classifiers are perhaps best known for their use in email spam filters, where they examine the words in the email, compare them to whether those words mostly appeared in spam or non-spam in the past, and use that to make a judgement on whether this latest email is likely to be spam.
We could do similarly, albeit with less input data. By recording all the useful information we see such as which enemy units are constructed, or which spells they use, or which technologies they have researched and then noting the resulting situation war vs peace, rush strategy vs defensive strategy, etc.
With all of these learning approaches, it might be sufficient — and often preferable — to run them on the data gathered during playtesting prior to release. By comparison, AI that adapts to the player after release might end up becoming too predictable or even too difficult to beat. Rather than just using the input data to choose between discrete pre-programmed strategies, maybe we want to change a set of values that inform our decision making.
Given a good understanding of our game world and game rules, we can do the following:. Imagine the computer agent has several major rooms in a FPS map to choose from. Each room has a weight which determines how desirable that room is to visit, all starting at the same value. When choosing where to go, it picks a room at random, but biased based on those weights.
Now imagine that when the computer agent is killed, it notes which room it was in and decreases the weight, so it is less likely to come back here in future. What if we wanted to use data collected like this to make predictions? For example, if we record each room we see a player in over a period of time as they play the game, we might reasonably expect to use that to predict which room the player might move to next.
By keeping track of both the current room the player is in and the previous room they were seen in, and recording that as a pair of values, we can calculate how often each of the former situations leads to the latter situation and use that for future predictions. Imagine there are 3 rooms, Red, Green, and Blue, and these are the observations we saw when watching a play session:.
It might be skewed by players spawning evenly across the map, equally likely to emerge into any of those three rooms.
We can also see that the Blue room is quite an undesirable destination — people rarely pass from the Red or Green rooms to the Blue room, and nobody seems to linger in the Blue room at all. But the data tells us something more specific — it says that when a player is in the Blue room, the next room we see them in is most likely to be the Red room, not the Green room.
Despite the Green room being a more popular destination overall than the Red one, that trend is slightly reversed if the player is currently in the Blue room. The next state i. Since they represent the chance of changes between successive states they are often visually represented as a finite state machine with the probability shown alongside each transition.
Previously we used a state machine to represent some sort of behavioural state that an agent was in, but the concept extends to any sort of state, whether to do with an agent or not. In this case the states represent the room the agent is occupying, and it would look like this:. This is a simple approach to represent the relative chance of different state transitions, giving the AI some predictive power regarding the next state. But we could go further, by using the system to look 2 or more steps into the future.
The table is just a visual aid — the procedure requires only that you multiply out the probabilities at each step. This means you could look a long way into the future, with one significant caveat: we are making an assumption that the chance of entering a room depends entirely on the current room they are in.
This is what we call the Markov Property — the idea that a future state depends only on the present state. While it allows us to use powerful tools like this Markov Chain, it is usually only an approximation. What about our fighting game combo-spotting example? This is a similar situation, where we want to predict a future state based on the past state in order to decide how to block or evade an attack , but rather than looking at a single state or event, we want to look for sequences of events that make up a combo move.
One way to do this is to store each input such as Kick , Punch , or Block in a buffer and record the whole buffer as the event. It would be possible to look at all the times that the player chose Kick followed by another Kick in the past, and then notice that the next input is always Punch. This lets the AI agent make a prediction that if the player has just chosen Kick followed by Kick, they are likely to choose Punch next, thereby triggering the SuperDeathFist. This allows the AI to consider picking an action that counteracts that, such as a block or evasive action.
These sequences of events are known as N-grams where N is the number of items stored. In the previous example it was a 3-gram, also known as a trigram, which means the first 2 entries are used to predict the 3rd one.
In a 5-gram, the first 4 entries would hopefully predict the 5th, and so on. Lower numbers require less memory, as there are fewer possible permutations, but they store less history and therefore lose context. On the other hand, higher numbers require more memory and are likely to be harder to train, as you will have many more possible permutations and therefore you might never see the same one twice.
For example, if you had the 3 possible inputs of Kick , Punch , or Block and were using grams then you have almost 60, different permutations. Tri-grams and larger N-grams can also be thought of as Markov Chains, where all but the last item in the N-gram together form the first state and the last item is the second state. Our fighting game example is representing the chance of moving from the Kick then Kick state to the Kick then Punch state.
By treating multiple entries of input history as a single unit, we are essentially transforming the input sequence into one piece of state, which gives us the Markov Property — allowing us to use Markov Chains to predict the next input, and thus to guess which combo move is coming next.
But how do we observe a whole game world effectively? We saw earlier that the way we represent the geography of the world can have a big effect on how we navigate it, so it is easy to imagine that this holds true about other aspects of game AI as well. How do we gather and organise all the information we need in a way that performs well so it can be updated often and used by many agents and is practical so that the information is easy to use with our decision-making?
How do we turn mere data into information or knowledge? This will vary from game to game, but there are a few common approaches that are widely used. Sometimes we already have a ton of usable data at our disposal and all we need is a good way to categorise and search through it. For example, maybe there are lots of objects in the game world and some of them make for good cover to avoid getting shot.
Or maybe we have a bunch of voice-acted lines, all only appropriate in certain situations, and we want a way to quickly know which is which. The obvious approach is to attach a small piece of extra information that we can use in the searches, and these are called tags.
Take the cover example; a game world may have a ton of props — crates, barrels, clumps of grass, wire fences. Some of them are suitable as cover, such as the crates and barrels, and some of them are not, such as the grass and the wire fence. Once they do this for all your barrels and intact crates, your AI routine only has to search for any object with that tag, and it can know that it is suitable.
This tag will still work if objects get renamed later, and can be added to future objects without requiring any extra code changes.
Some engines provide tag functionality built in, such as Unity and Unreal Engine 4 so all you have to do is decide on your set of tags and use them where necessary. Imagine a medieval city simulator, where the adventurers within wander around of their own accord, training, fighting, and resting as necessary.
That works. You find yourself needing to support multiple tags per location, looking up different animations based on which aspect of the training the adventurer needs, etc. It can then move to the relevant place, perform the relevant animation or any other prerequisite activity as specified by the object, and gain the rewards accordingly.
An archer character in the vicinity of the above 4 locations would be given these 6 options, of which 4 are irrelevant as the character is neither a sword user nor a magic user.
By matching on the outcome, in this case, the improvement in skill, rather than on a name or a tag, we make it easy to extend our world with new behaviour.
R R is a programming language for Graphics and Statistical computing. R is supported by the R Foundation for Statistical Computing.
Python is a general-purpose coding language. It can be used for all types of programming and software development. Source: Octoverse. It provides faster execution and has less response time which is applied in search engines and development of computer games. SHARK is a super-fast library with support for supervised learning algorithms, linear regression, neural networks, and clustering.
Java is another general-purpose coding language that can be used for all types of software development. For AI, Java is mostly used to create machine learning solutions, search algorithms, and neural networks. We just launched W3Schools videos. Get certified by completing a course today!
0コメント